SlideShare ist ein Scribd-Unternehmen logo
1 von 20
MegaDepth: Learning Single-View Depth
Prediction from Internet Photos
CVPR2018
Zhengqi Li Noah Snavely
인공지능연구원
이광희
2
 Limitations in the available training data
• NYU: Indoor-only images
• Make3D: Small numbers of training examples
• KITTI: Sparse sampling
• Difficult to collect (RGB image, depth map)
• RGB-D sensors (Kinect): limited to indoor use
• LIDAR: sparse depth maps
 Contributions : MegaDepth
• Multi-view internet photo collections (a virtually unlimited data source)
• Generate training data via modern structure-from-motion (SfM) and multi-view stereo (MVS)
• Challenges: noise and unreconstructable objects
• New data cleaning methods & automatically augmenting data with ordinal depth relations generated using
semantic segmentation.
• High accuracy & Generalizability
Motivation
3
 Download Internet photos from Flickr (Landmarks10K dataset)
 COLMAP : SOTA SfM system and MVS system
 Camera poses, sparse point clouds, dense depth maps
The MegaDepth Dataset
4
 Raw depth maps from COLMAP
• Transient objects(people, cars, etc.)
• Noisy depth discontinuities
• Bleeding of background depths into foreground objects
 Modified COLMAP
• At each iteration, keep smaller(closer) of the two at each pixel.
• Apply a median filter to remove unstable depth values
Depth map refinement
5
 Depth enhancement via semantic segmentation
• Semantic filtering (transient objects & difficult-to-reconstruct objects)
- PSPNet (150 semantic categories)
- divide the pixels into three subsets (Foreground/Background/Sky)
- If<50% of pixels in C(of F) have a reconstructed depth, discard all depths from C.
• Euclidean vs. ordinal depth (filtering training data)
- if >30% of an image I consists of valid depth values, then keep that image as training data
for learning Euclidean depth
• Automatic ordinal depth labeling
- Ford: if the area of C(of F) is larger than 5% of the image
- Bord : if p’s C(of B) is larger than 5% of the image
& p has a valid depth value that lies in the last quartile
of the full range of depths for I
Depth map refinement
F: statues, fountains, people, cars
B: building, towers, mountains, etc.
6
 200 3D models from landmarks around the world
 150K reconstructed images
 After filtering: 130K valid images
 Euclidean depth data: 100K images
 Ordinal depth data: 30K images
 Additional dataset: images from [18]
Creating a dataset
MegaDepth(MD)
[18] Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction (siggraph2017)
7
 Network architecture
• VGG
• ResNet
• Hourglass network
Depth estimation network
hourglass network
8
 Unknown scale factor: cannot compare predicted and ground truth depths
directly
 The ratio of pairs of depths are preserved under scaling.
 In the log-depth domain, the difference between pairs of log-depth
Scale-invariant Loss function
Scale-invariant data term
Muti-scale scale-invariant
gradient matching term
Robust ordinal depth loss
9
 Scale-invariant data term
Loss Function
Predicted log-depth map
GT log-depth map
10
 Multi-scale scale-invariant gradient matching term
Loss Function
11
 Robust ordinal depth loss
Loss Function
i ∈ Ford , j ∈ Bord
12
 Generalization
• to new Internet photos from never-before-seen locations
• to other types of images from other dataset
 The effect of terms in our loss function
 Experimental Setup
• Test set: 46/200 (reconstructed models)
• Training set / validation set: Randomly split 96%:4% (for 154 models)
Evaluation
13
 Error metircs
• Si-RMSE:
• SfM Disagreement Rate(SDR):
Evaluation and ablation study on MD test set
14
 Effect of loss network and loss variants
Evaluation and ablation study on MD test set
15
Evaluation and ablation study on MD test set
16
 Raw MD vs Clean MD
Evaluation and ablation study on MD test set
17
Generalization to other datasets
18
Generalization to other datasets
19
Generalization to other datasets
20
 Present a new use for Internet-derived SfM+MVS data
 Generating large amounts of training data for single view depth prediction
 generalizes very well to other datasets.
 Limitations:
• oblique surfaces (e.g., ground), thin or complex objects (e.g., lampposts), and difficult materials (e.g.,
shiny glass)
• not predict metric depth
Conclusion

Weitere ähnliche Inhalte

Was ist angesagt?

論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...Ken Sakurada
 
Using Pleiades Very High Resolution Images for planning activities in feasib...
Using Pleiades  Very High Resolution Images for planning activities in feasib...Using Pleiades  Very High Resolution Images for planning activities in feasib...
Using Pleiades Very High Resolution Images for planning activities in feasib...Argongra Gis
 
Pleiades - satellite imagery - very high resolution
Pleiades - satellite imagery - very high resolutionPleiades - satellite imagery - very high resolution
Pleiades - satellite imagery - very high resolutionSpot Image
 
Digital Elevation Models - WUR - Grontmij
Digital Elevation Models - WUR - GrontmijDigital Elevation Models - WUR - Grontmij
Digital Elevation Models - WUR - GrontmijXander Bakker
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 
Application of terrestrial 3D laser scanning in building information modellin...
Application of terrestrial 3D laser scanning in building information modellin...Application of terrestrial 3D laser scanning in building information modellin...
Application of terrestrial 3D laser scanning in building information modellin...Martin Ma
 
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Beniamino Murgante
 
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxSpectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxgrssieee
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networksEntrepreneur / Startup
 
3D SLAM introcution& current status
3D SLAM introcution& current status3D SLAM introcution& current status
3D SLAM introcution& current statuse8xu
 
Terrestrial laser scanning
Terrestrial laser scanningTerrestrial laser scanning
Terrestrial laser scanningIris Kramer
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015Symeon Papadopoulos
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM SeminarDong-Won Shin
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 
VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)Hung-Chih Chang
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusionDong-Won Shin
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchBill Liu
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
Team 9: Extraction and classification of satellite image patches
Team 9: Extraction and classification of satellite image patchesTeam 9: Extraction and classification of satellite image patches
Team 9: Extraction and classification of satellite image patchesleopauly
 

Was ist angesagt? (20)

論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
 
Using Pleiades Very High Resolution Images for planning activities in feasib...
Using Pleiades  Very High Resolution Images for planning activities in feasib...Using Pleiades  Very High Resolution Images for planning activities in feasib...
Using Pleiades Very High Resolution Images for planning activities in feasib...
 
Pleiades - satellite imagery - very high resolution
Pleiades - satellite imagery - very high resolutionPleiades - satellite imagery - very high resolution
Pleiades - satellite imagery - very high resolution
 
Digital Elevation Models - WUR - Grontmij
Digital Elevation Models - WUR - GrontmijDigital Elevation Models - WUR - Grontmij
Digital Elevation Models - WUR - Grontmij
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
Application of terrestrial 3D laser scanning in building information modellin...
Application of terrestrial 3D laser scanning in building information modellin...Application of terrestrial 3D laser scanning in building information modellin...
Application of terrestrial 3D laser scanning in building information modellin...
 
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
 
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxSpectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
 
3D SLAM introcution& current status
3D SLAM introcution& current status3D SLAM introcution& current status
3D SLAM introcution& current status
 
Terrestrial laser scanning
Terrestrial laser scanningTerrestrial laser scanning
Terrestrial laser scanning
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM Seminar
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
 
Kintinuous review
Kintinuous reviewKintinuous review
Kintinuous review
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Team 9: Extraction and classification of satellite image patches
Team 9: Extraction and classification of satellite image patchesTeam 9: Extraction and classification of satellite image patches
Team 9: Extraction and classification of satellite image patches
 

Ähnlich wie PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos

Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in ImagesAnil Kumar Gupta
 
Master Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
Master Thesis of Computer Engineering SuperResoluton Giuseppe CaliendoMaster Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
Master Thesis of Computer Engineering SuperResoluton Giuseppe CaliendoGiuseppeCaliendo2
 
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD:  CNN architecture for weakly supervised place recognitionNetVLAD:  CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognitionGeunhee Cho
 
Synthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in RoboticsSynthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in RoboticsPrabindh Sundareson
 
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSravan Puttagunta
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakPyData
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?Anubhav Jain
 
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...Tomohiro Fukuda
 
VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...
VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...
VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...grssieee
 
Desktop Softwares for Unmanned Aerial Systems(UAS))
Desktop Softwares for Unmanned Aerial Systems(UAS))Desktop Softwares for Unmanned Aerial Systems(UAS))
Desktop Softwares for Unmanned Aerial Systems(UAS))Kamal Shahi
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Chapter 1-Introduction.pptx bjhgvjjkllbuu
Chapter 1-Introduction.pptx bjhgvjjkllbuuChapter 1-Introduction.pptx bjhgvjjkllbuu
Chapter 1-Introduction.pptx bjhgvjjkllbuuLusi39
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
witenberg-iit-research-poster-jul2015(1)
witenberg-iit-research-poster-jul2015(1)witenberg-iit-research-poster-jul2015(1)
witenberg-iit-research-poster-jul2015(1)Witenberg S. R. Souza
 
Ala Stolpnik's Standard Model talk
Ala Stolpnik's Standard Model talkAla Stolpnik's Standard Model talk
Ala Stolpnik's Standard Model talkwolf
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...multimediaeval
 

Ähnlich wie PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos (20)

Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in Images
 
Master Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
Master Thesis of Computer Engineering SuperResoluton Giuseppe CaliendoMaster Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
Master Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
 
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD:  CNN architecture for weakly supervised place recognitionNetVLAD:  CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognition
 
Synthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in RoboticsSynthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in Robotics
 
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
 
Soumyadip_Chandra
Soumyadip_ChandraSoumyadip_Chandra
Soumyadip_Chandra
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?
 
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
 
VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...
VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...
VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...
 
Desktop Softwares for Unmanned Aerial Systems(UAS))
Desktop Softwares for Unmanned Aerial Systems(UAS))Desktop Softwares for Unmanned Aerial Systems(UAS))
Desktop Softwares for Unmanned Aerial Systems(UAS))
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Chapter 1-Introduction.pptx bjhgvjjkllbuu
Chapter 1-Introduction.pptx bjhgvjjkllbuuChapter 1-Introduction.pptx bjhgvjjkllbuu
Chapter 1-Introduction.pptx bjhgvjjkllbuu
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
witenberg-iit-research-poster-jul2015(1)
witenberg-iit-research-poster-jul2015(1)witenberg-iit-research-poster-jul2015(1)
witenberg-iit-research-poster-jul2015(1)
 
DSM Extraction from Pleiades Images using MICMAC
DSM Extraction from Pleiades Images using MICMAC DSM Extraction from Pleiades Images using MICMAC
DSM Extraction from Pleiades Images using MICMAC
 
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
 
Ala Stolpnik's Standard Model talk
Ala Stolpnik's Standard Model talkAla Stolpnik's Standard Model talk
Ala Stolpnik's Standard Model talk
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
 

Mehr von 광희 이

LFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual ExplanationLFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual Explanation광희 이
 
Unsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 networkUnsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 network광희 이
 
보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?광희 이
 
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...광희 이
 
PR-073 : Generative Semantic Manipulation with Contrasting GAN
PR-073 : Generative Semantic Manipulation with Contrasting GANPR-073 : Generative Semantic Manipulation with Contrasting GAN
PR-073 : Generative Semantic Manipulation with Contrasting GAN광희 이
 
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...광희 이
 

Mehr von 광희 이 (7)

LFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual ExplanationLFI-CAM: Learning Feature Importance for Better Visual Explanation
LFI-CAM: Learning Feature Importance for Better Visual Explanation
 
Unsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 networkUnsupervised image to-image translation via pre-trained style gan2 network
Unsupervised image to-image translation via pre-trained style gan2 network
 
보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?보다 유연한 이미지 변환을 하려면?
보다 유연한 이미지 변환을 하려면?
 
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
 
PR-073 : Generative Semantic Manipulation with Contrasting GAN
PR-073 : Generative Semantic Manipulation with Contrasting GANPR-073 : Generative Semantic Manipulation with Contrasting GAN
PR-073 : Generative Semantic Manipulation with Contrasting GAN
 
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
 
PR12-CAM
PR12-CAMPR12-CAM
PR12-CAM
 

Kürzlich hochgeladen

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos

  • 1. MegaDepth: Learning Single-View Depth Prediction from Internet Photos CVPR2018 Zhengqi Li Noah Snavely 인공지능연구원 이광희
  • 2. 2  Limitations in the available training data • NYU: Indoor-only images • Make3D: Small numbers of training examples • KITTI: Sparse sampling • Difficult to collect (RGB image, depth map) • RGB-D sensors (Kinect): limited to indoor use • LIDAR: sparse depth maps  Contributions : MegaDepth • Multi-view internet photo collections (a virtually unlimited data source) • Generate training data via modern structure-from-motion (SfM) and multi-view stereo (MVS) • Challenges: noise and unreconstructable objects • New data cleaning methods & automatically augmenting data with ordinal depth relations generated using semantic segmentation. • High accuracy & Generalizability Motivation
  • 3. 3  Download Internet photos from Flickr (Landmarks10K dataset)  COLMAP : SOTA SfM system and MVS system  Camera poses, sparse point clouds, dense depth maps The MegaDepth Dataset
  • 4. 4  Raw depth maps from COLMAP • Transient objects(people, cars, etc.) • Noisy depth discontinuities • Bleeding of background depths into foreground objects  Modified COLMAP • At each iteration, keep smaller(closer) of the two at each pixel. • Apply a median filter to remove unstable depth values Depth map refinement
  • 5. 5  Depth enhancement via semantic segmentation • Semantic filtering (transient objects & difficult-to-reconstruct objects) - PSPNet (150 semantic categories) - divide the pixels into three subsets (Foreground/Background/Sky) - If<50% of pixels in C(of F) have a reconstructed depth, discard all depths from C. • Euclidean vs. ordinal depth (filtering training data) - if >30% of an image I consists of valid depth values, then keep that image as training data for learning Euclidean depth • Automatic ordinal depth labeling - Ford: if the area of C(of F) is larger than 5% of the image - Bord : if p’s C(of B) is larger than 5% of the image & p has a valid depth value that lies in the last quartile of the full range of depths for I Depth map refinement F: statues, fountains, people, cars B: building, towers, mountains, etc.
  • 6. 6  200 3D models from landmarks around the world  150K reconstructed images  After filtering: 130K valid images  Euclidean depth data: 100K images  Ordinal depth data: 30K images  Additional dataset: images from [18] Creating a dataset MegaDepth(MD) [18] Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction (siggraph2017)
  • 7. 7  Network architecture • VGG • ResNet • Hourglass network Depth estimation network hourglass network
  • 8. 8  Unknown scale factor: cannot compare predicted and ground truth depths directly  The ratio of pairs of depths are preserved under scaling.  In the log-depth domain, the difference between pairs of log-depth Scale-invariant Loss function Scale-invariant data term Muti-scale scale-invariant gradient matching term Robust ordinal depth loss
  • 9. 9  Scale-invariant data term Loss Function Predicted log-depth map GT log-depth map
  • 10. 10  Multi-scale scale-invariant gradient matching term Loss Function
  • 11. 11  Robust ordinal depth loss Loss Function i ∈ Ford , j ∈ Bord
  • 12. 12  Generalization • to new Internet photos from never-before-seen locations • to other types of images from other dataset  The effect of terms in our loss function  Experimental Setup • Test set: 46/200 (reconstructed models) • Training set / validation set: Randomly split 96%:4% (for 154 models) Evaluation
  • 13. 13  Error metircs • Si-RMSE: • SfM Disagreement Rate(SDR): Evaluation and ablation study on MD test set
  • 14. 14  Effect of loss network and loss variants Evaluation and ablation study on MD test set
  • 15. 15 Evaluation and ablation study on MD test set
  • 16. 16  Raw MD vs Clean MD Evaluation and ablation study on MD test set
  • 20. 20  Present a new use for Internet-derived SfM+MVS data  Generating large amounts of training data for single view depth prediction  generalizes very well to other datasets.  Limitations: • oblique surfaces (e.g., ground), thin or complex objects (e.g., lampposts), and difficult materials (e.g., shiny glass) • not predict metric depth Conclusion