SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Generation of Synthetic Referring Expressions
for Object Segmentation in Videos
Author: Ioannis Kazakos
Table of
Contents
1. Topic & Background
2. Relevant Literature
3. Motivation
4. Method
5. Experiments & Results
6. Conclusions
1.Topic & Background
Vision & Language
● Recently raised research area
● Owing to deep learning revolution and independent success in CV and NLP
○ CNNs, Object detection/segmentation models
○ LSTMs, Word embeddings
● Many applications
○ Autonomous driving
○ Assistance of visually impaired individuals
○ Interactive video editing
○ Navigation from vision and language etc.
Vision & Language Tasks
● Visual Question Answering, Agrawal et al. 2015
● Caption Generation, Vinyals et al. 2015
● Text to Images, Zhang et al. 2016
● And many more!
● Referring Expression
○ An accurate description of a specific object, but not of any other object in the current scene
○ Example:
■ “a woman” ❌
■ “a woman in red” ❌
■ “a woman in red on the right” ✅
■ “a woman in red top and blue shorts” ✅
● Object Segmentation
○ Assign a label to every pixel corresponding to the target object
Object Segmentation with Referring Expressions
Referring Expression Video Object Segmentation
2. Relevant Literature
Many works on images
● First work: “Segmentation from Natural Language Expressions”, Hu et al. 2016
● Subsequent works tried to jointly model vision and language features and leverage
attention to better capture dependencies between visual and linguistic features
● Most of these works use the Refer-It collection of datasets for training and evaluation
○ Three large-scale image datasets with referring expressions and segmentation masks
○ Collected on top of Microsoft COCO (Common objects in Context)
○ RefCOCO, RefCOCO+ and RefCOCOg
● 142,209 referring expressions
● 50,000 objects
● 19,994 images
RefCOCO dataset
Expression = “right kid”
Expression = “left elephant”
Few works on videos
● “Video Object Segmentation with
Language Referring Expressions”,
Khoreva et al. 2018
○ DAVIS-2017: Big set of 78 object classes
○ Too few videos (150 in total)
○ They use a frame-based model
○ Pre-training on RefCOCO is used
● “Actor and action segmentation from a
sentence”, Gavrilyuk et al. 2018
○ A2D: Small set of object classes (only 8 actors)
○ J-HDMB: Single object in each video
DAVIS-2017
3. Motivation
Main Challenges
● Models
○ Temporal consistency across frames
○ Models’ size and complexity
● Data
○ No large-scale datasets for videos
○ Poor quality of crowdsourced referring expressions
■ ~10% fail to correctly describe the target object (no RE)
Analysis from Bellver et al. 2020
A2D
DAVIS-2017
Method Inspiration
A2D
DAVIS-2017
● Existing datasets include trivial cases where a single object from each class
appears
● In such cases an object can be identified using only its class e.g. saying “a
person” or “a horse”
● Existing large datasets for video object segmentation are labeled in terms of
object classes
● Annotating a large dataset with referring expressions requires tremendous
human effort
Basic Idea
Generate (automatically) synthetic referring expressions starting from an object’s
class and enhancing them with other cues without any human annotation cost
Thesis Purpose
1. Propose a method for generating synthetic referring expressions for a large-scale
video object segmentation dataset
1. Evaluate the effectiveness of the generated synthetic referring expressions for the
task of video object segmentation with referring expressions
4. Method
YouTube-VIS Dataset
YouTube-VOS
→ Large-scale dataset for video object segmentation
→ Short YouTube videos of 3-6 seconds
→ 4,453 videos in total
→ 94 object categories
YouTube-VIS
→ Created on top of YouTube-VOS
→ 2,883 videos
→ 40 object classes
→ Exhaustively annotated = All objects belonging to
the 40 classes are labeled with pixel-wise masks.
● The formulation of our method allows its application to any other object
detection/segmentation dataset
● We apply our proposed method on the YouTube-VIS dataset
Overview
1. Ground-truth annotations
● Object class
● Bounding boxes
○ Relative size
○ Relative location
2. Faster R-CNN, Ren et al. 2015
● Enhanced with attribute head by Tang et al. 2020
● Pre-trained on Visual Genome dataset for attribute detection
○ Able to detect a predefined set of 201 attributes
○ Includes color and non-color attributes
○ Non-color attributes can be adjectives (“large”, “spotted”) or verbs (“surfing”)
Cues
1. Object Class (e.g “a person”)
○ It can be enough only if one object of this class is present in the video frame
○ However, in most cases more cues are necessary
Cues
2. Relative Size
○ The areas At and Ao of the target and other object
bounding boxes are computed:
■ At >= 2Ao : “bigger” is added to the ref. expression
■ At <= 0.5Ao : “smaller” respectively
■ 0.5Ao < At < 2Ao : relative location not applicable
○ Similarly for more objects, “biggest”/“smallest” if
target is “bigger”/ “smaller” than all other objects
“a bigger dog”
Cues
3. Relative Location (1 or 2 other objects of the same class)
○ The most discriminative axis (X or Y) is determined using the bounding boxes boundaries
○ The maximum non-overlapping distance between bounding boxes is calculated
○ If distance above a certain threshold, relative location is computed, according to the axis found:
■ If X-axis: “on the left” / “on the right”
■ If Y-axis: “in the front” / “in the back”
○ For 3 objects, combinations of relative locations of each pair of objects are combined (e.g “in
the middle”, “in the front left” etc.)
“rabbit on the left”
rabbit rabbit
rabbit
Cues
4. Attributes
○ Faster R-CNN detection is matched to the target object using Intersection-over-Union
○ An attribute is added to the referring expression only if it is unique for the target object
○ Attributes can be colors, other adjectives (“spotted”, “large”) and verbs (“walking”, “surfing”)
○ We select up to 2 color attributes (e.g. “brown and black dog”) and 1 non-color (e.g. “walking”)
Detected Attributes:
'white' : 0.9250
'black' : 0.8844
'brown' : 0.8062
“a white rabbit”
SynthRef-YouTube-VIS
Example of referring expressions
generated with the proposed method
5. Experiments & Results
We use RefVOS model (Bellver et al. 2020) for the experiments
● Frame-based model
● DeepLabv3 visual encoder
● BERT language encoder
● Multi-modal embedding obtained via multiplication
DeepLabv3
Model
Training Details
● Batch size of 8 video frames (2 GPUs)
● Frames are cropped/padded to 480x480
● SGD optimizer
● Learning rate policy depends on the target dataset
Evaluation Metrics
1. Region Similarity (J)
Jaccard Index (Intersection-over-Union) between predicted and ground-truth mask
1. Contour Accuracy (F)
F1-score of the contour-based precision Pc and recall Rc between the contour points of the
predicted mask c(M) and the ground-truth c(G), computed via a bipartite graph matching.
1. Precision@X
Given a threshold X in the range [0.5,0.9], a predicted mask for an object is counted as true positive
if its J is larger than X, and as false positive otherwise. Then, Precision is computed
as the ratio between the number of true positives and the total number of instances
Experiments
1. Extra pre-training of the model using the generated synthetic data and
evaluating on DAVIS-2017 and A2D Sentences datasets
Results on DAVIS-2017
DAVIS-2017
Validation
DAVIS-2017
Train & Validation
No fine-tuning
Fine-tuning
Qualitative Results on DAVIS-2017
Pre-trained only on RefCOCO Pre-trained on RefCOCO + SynthRef-YouTube-VIS
Results on A2D Sentences
Referring expressions of A2D Sentences are focused on actions,
including mostly verbs and less attributes
Experiments
1. Pre-training the model using the generated synthetic data and evaluating on
DAVIS-2017 and A2D Sentences datasets
1. Training on human vs synthetic referring expressions on the same videos
Refer-YouTube-VOS
● Seo et al. 2020 annotated YouTube-VOS dataset with referring expressions
● This allowed a direct comparison of our synthetic referring expressions with human-produced
ones
Human vs Synthetic
Training:
1. Synthetic referring expressions from SynthRef-YouTube-VIS (our synthetic dataset)
2. Human-produced referring expressions from Refer-YouTube-VOS
Evaluation: On the test split of SynthRef-YouTube-VIS using human-produced referring expressions
from Refer-YouTube-VOS
Experiments
1. Pre-training the model using the generated synthetic data and evaluating on
DAVIS-2017 and A2D Sentences datasets
1. Training on human vs synthetic referring expressions on the same videos
1. Ablation study
Ablation Study
● Impact of Synthetic Referring Expression Information (DAVIS-2017)
● Freezing the language branch for synthetic pre-training
6. Conclusions
1. Pre-training a model using the synthetic referring expressions, when it is additionally
trained on real ones, increases its ability to generalize across different datasets.
1. Gains are higher when no fine-tuning is performed on the target dataset
1. Synthetic referring expressions do not achieve better results than human-produced ones
but can be used complementary without any additional annotation cost
1. More information in the referring expressions yields better segmentation accuracy
Conclusions
● Extend the proposed method by adding more cues
○ Use scene-graph generation models to add relationships between objects
Image from Xu et al. 2017
● Apply the proposed method to other existing object detection/segmentation datasets
○ Create synthetic expressions for Microsoft COCO images to be used interchangeably with RefCOCO
Future work
Thank you!
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Zero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferZero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferRoelof Pieters
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueJinho Choi
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categoriesWarNik Chow
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopiwan_rg
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Deep natural language processing in search systems
Deep natural language processing in search systemsDeep natural language processing in search systems
Deep natural language processing in search systemsBill Liu
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
Conversational Agents in Portuguese: A Study Using Deep Learning
Conversational Agents in Portuguese: A Study Using Deep LearningConversational Agents in Portuguese: A Study Using Deep Learning
Conversational Agents in Portuguese: A Study Using Deep LearningAndherson Maeda
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Weiwei Guo
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingSeonghyun Kim
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...European Data Forum
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 

Was ist angesagt? (20)

Zero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferZero shot learning through cross-modal transfer
Zero shot learning through cross-modal transfer
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
 
DeepPavlov 2019
DeepPavlov 2019DeepPavlov 2019
DeepPavlov 2019
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Deep natural language processing in search systems
Deep natural language processing in search systemsDeep natural language processing in search systems
Deep natural language processing in search systems
 
1909 paclic
1909 paclic1909 paclic
1909 paclic
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Conversational Agents in Portuguese: A Study Using Deep Learning
Conversational Agents in Portuguese: A Study Using Deep LearningConversational Agents in Portuguese: A Study Using Deep Learning
Conversational Agents in Portuguese: A Study Using Deep Learning
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
 
ThesisPresentation
ThesisPresentationThesisPresentation
ThesisPresentation
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 

Ähnlich wie Generation of Synthetic Referring Expressions for Object Segmentation in Videos

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsBreaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?klschoef
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...Wee Hyong Tok
 
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"..."How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...Edge AI and Vision Alliance
 
Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]이 의령
 
3D Environment : HomeNavigation
3D Environment : HomeNavigation3D Environment : HomeNavigation
3D Environment : HomeNavigationYeChan(Paul) Kim
 
Deep Learning for Artificial Intelligence (AI)
Deep Learning for Artificial Intelligence (AI)Deep Learning for Artificial Intelligence (AI)
Deep Learning for Artificial Intelligence (AI)Er. Shiva K. Shrestha
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learningVaibhav Singh
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationEnrico Palumbo
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Detection ofs Signlanguageminorppt1.pptx
Detection ofs Signlanguageminorppt1.pptxDetection ofs Signlanguageminorppt1.pptx
Detection ofs Signlanguageminorppt1.pptxvigocib930
 
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...Antonio Tejero de Pablos
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptxSaravanaD2
 

Ähnlich wie Generation of Synthetic Referring Expressions for Object Segmentation in Videos (20)

Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
American sign language recognizer
American sign language recognizerAmerican sign language recognizer
American sign language recognizer
 
Video+Language: From Classification to Description
Video+Language: From Classification to DescriptionVideo+Language: From Classification to Description
Video+Language: From Classification to Description
 
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsBreaking Through The Challenges of Scalable Deep Learning for Video Analytics
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search: Where is the User in the Age of Deep Learning?
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...
 
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"..."How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
 
Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]
 
Mapping Keywords to
Mapping Keywords to Mapping Keywords to
Mapping Keywords to
 
3D Environment : HomeNavigation
3D Environment : HomeNavigation3D Environment : HomeNavigation
3D Environment : HomeNavigation
 
Deep Learning for Artificial Intelligence (AI)
Deep Learning for Artificial Intelligence (AI)Deep Learning for Artificial Intelligence (AI)
Deep Learning for Artificial Intelligence (AI)
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Detection ofs Signlanguageminorppt1.pptx
Detection ofs Signlanguageminorppt1.pptxDetection ofs Signlanguageminorppt1.pptx
Detection ofs Signlanguageminorppt1.pptx
 
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptx
 

Mehr von Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 

Mehr von Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 

Kürzlich hochgeladen

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 

Kürzlich hochgeladen (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 

Generation of Synthetic Referring Expressions for Object Segmentation in Videos

  • 1. Generation of Synthetic Referring Expressions for Object Segmentation in Videos Author: Ioannis Kazakos
  • 2. Table of Contents 1. Topic & Background 2. Relevant Literature 3. Motivation 4. Method 5. Experiments & Results 6. Conclusions
  • 4. Vision & Language ● Recently raised research area ● Owing to deep learning revolution and independent success in CV and NLP ○ CNNs, Object detection/segmentation models ○ LSTMs, Word embeddings ● Many applications ○ Autonomous driving ○ Assistance of visually impaired individuals ○ Interactive video editing ○ Navigation from vision and language etc.
  • 5. Vision & Language Tasks ● Visual Question Answering, Agrawal et al. 2015 ● Caption Generation, Vinyals et al. 2015 ● Text to Images, Zhang et al. 2016 ● And many more!
  • 6. ● Referring Expression ○ An accurate description of a specific object, but not of any other object in the current scene ○ Example: ■ “a woman” ❌ ■ “a woman in red” ❌ ■ “a woman in red on the right” ✅ ■ “a woman in red top and blue shorts” ✅ ● Object Segmentation ○ Assign a label to every pixel corresponding to the target object Object Segmentation with Referring Expressions
  • 7. Referring Expression Video Object Segmentation
  • 9. Many works on images ● First work: “Segmentation from Natural Language Expressions”, Hu et al. 2016 ● Subsequent works tried to jointly model vision and language features and leverage attention to better capture dependencies between visual and linguistic features ● Most of these works use the Refer-It collection of datasets for training and evaluation ○ Three large-scale image datasets with referring expressions and segmentation masks ○ Collected on top of Microsoft COCO (Common objects in Context) ○ RefCOCO, RefCOCO+ and RefCOCOg
  • 10. ● 142,209 referring expressions ● 50,000 objects ● 19,994 images RefCOCO dataset Expression = “right kid” Expression = “left elephant”
  • 11. Few works on videos ● “Video Object Segmentation with Language Referring Expressions”, Khoreva et al. 2018 ○ DAVIS-2017: Big set of 78 object classes ○ Too few videos (150 in total) ○ They use a frame-based model ○ Pre-training on RefCOCO is used ● “Actor and action segmentation from a sentence”, Gavrilyuk et al. 2018 ○ A2D: Small set of object classes (only 8 actors) ○ J-HDMB: Single object in each video DAVIS-2017
  • 13. Main Challenges ● Models ○ Temporal consistency across frames ○ Models’ size and complexity ● Data ○ No large-scale datasets for videos ○ Poor quality of crowdsourced referring expressions ■ ~10% fail to correctly describe the target object (no RE) Analysis from Bellver et al. 2020 A2D DAVIS-2017
  • 14. Method Inspiration A2D DAVIS-2017 ● Existing datasets include trivial cases where a single object from each class appears ● In such cases an object can be identified using only its class e.g. saying “a person” or “a horse” ● Existing large datasets for video object segmentation are labeled in terms of object classes ● Annotating a large dataset with referring expressions requires tremendous human effort
  • 15. Basic Idea Generate (automatically) synthetic referring expressions starting from an object’s class and enhancing them with other cues without any human annotation cost
  • 16. Thesis Purpose 1. Propose a method for generating synthetic referring expressions for a large-scale video object segmentation dataset 1. Evaluate the effectiveness of the generated synthetic referring expressions for the task of video object segmentation with referring expressions
  • 18. YouTube-VIS Dataset YouTube-VOS → Large-scale dataset for video object segmentation → Short YouTube videos of 3-6 seconds → 4,453 videos in total → 94 object categories YouTube-VIS → Created on top of YouTube-VOS → 2,883 videos → 40 object classes → Exhaustively annotated = All objects belonging to the 40 classes are labeled with pixel-wise masks. ● The formulation of our method allows its application to any other object detection/segmentation dataset ● We apply our proposed method on the YouTube-VIS dataset
  • 19. Overview 1. Ground-truth annotations ● Object class ● Bounding boxes ○ Relative size ○ Relative location 2. Faster R-CNN, Ren et al. 2015 ● Enhanced with attribute head by Tang et al. 2020 ● Pre-trained on Visual Genome dataset for attribute detection ○ Able to detect a predefined set of 201 attributes ○ Includes color and non-color attributes ○ Non-color attributes can be adjectives (“large”, “spotted”) or verbs (“surfing”)
  • 20. Cues 1. Object Class (e.g “a person”) ○ It can be enough only if one object of this class is present in the video frame ○ However, in most cases more cues are necessary
  • 21. Cues 2. Relative Size ○ The areas At and Ao of the target and other object bounding boxes are computed: ■ At >= 2Ao : “bigger” is added to the ref. expression ■ At <= 0.5Ao : “smaller” respectively ■ 0.5Ao < At < 2Ao : relative location not applicable ○ Similarly for more objects, “biggest”/“smallest” if target is “bigger”/ “smaller” than all other objects “a bigger dog”
  • 22. Cues 3. Relative Location (1 or 2 other objects of the same class) ○ The most discriminative axis (X or Y) is determined using the bounding boxes boundaries ○ The maximum non-overlapping distance between bounding boxes is calculated ○ If distance above a certain threshold, relative location is computed, according to the axis found: ■ If X-axis: “on the left” / “on the right” ■ If Y-axis: “in the front” / “in the back” ○ For 3 objects, combinations of relative locations of each pair of objects are combined (e.g “in the middle”, “in the front left” etc.) “rabbit on the left” rabbit rabbit rabbit
  • 23. Cues 4. Attributes ○ Faster R-CNN detection is matched to the target object using Intersection-over-Union ○ An attribute is added to the referring expression only if it is unique for the target object ○ Attributes can be colors, other adjectives (“spotted”, “large”) and verbs (“walking”, “surfing”) ○ We select up to 2 color attributes (e.g. “brown and black dog”) and 1 non-color (e.g. “walking”) Detected Attributes: 'white' : 0.9250 'black' : 0.8844 'brown' : 0.8062 “a white rabbit”
  • 24. SynthRef-YouTube-VIS Example of referring expressions generated with the proposed method
  • 25. 5. Experiments & Results
  • 26. We use RefVOS model (Bellver et al. 2020) for the experiments ● Frame-based model ● DeepLabv3 visual encoder ● BERT language encoder ● Multi-modal embedding obtained via multiplication DeepLabv3 Model
  • 27. Training Details ● Batch size of 8 video frames (2 GPUs) ● Frames are cropped/padded to 480x480 ● SGD optimizer ● Learning rate policy depends on the target dataset
  • 28. Evaluation Metrics 1. Region Similarity (J) Jaccard Index (Intersection-over-Union) between predicted and ground-truth mask 1. Contour Accuracy (F) F1-score of the contour-based precision Pc and recall Rc between the contour points of the predicted mask c(M) and the ground-truth c(G), computed via a bipartite graph matching. 1. Precision@X Given a threshold X in the range [0.5,0.9], a predicted mask for an object is counted as true positive if its J is larger than X, and as false positive otherwise. Then, Precision is computed as the ratio between the number of true positives and the total number of instances
  • 29. Experiments 1. Extra pre-training of the model using the generated synthetic data and evaluating on DAVIS-2017 and A2D Sentences datasets
  • 30. Results on DAVIS-2017 DAVIS-2017 Validation DAVIS-2017 Train & Validation No fine-tuning Fine-tuning
  • 31. Qualitative Results on DAVIS-2017 Pre-trained only on RefCOCO Pre-trained on RefCOCO + SynthRef-YouTube-VIS
  • 32. Results on A2D Sentences Referring expressions of A2D Sentences are focused on actions, including mostly verbs and less attributes
  • 33. Experiments 1. Pre-training the model using the generated synthetic data and evaluating on DAVIS-2017 and A2D Sentences datasets 1. Training on human vs synthetic referring expressions on the same videos
  • 34. Refer-YouTube-VOS ● Seo et al. 2020 annotated YouTube-VOS dataset with referring expressions ● This allowed a direct comparison of our synthetic referring expressions with human-produced ones
  • 35. Human vs Synthetic Training: 1. Synthetic referring expressions from SynthRef-YouTube-VIS (our synthetic dataset) 2. Human-produced referring expressions from Refer-YouTube-VOS Evaluation: On the test split of SynthRef-YouTube-VIS using human-produced referring expressions from Refer-YouTube-VOS
  • 36. Experiments 1. Pre-training the model using the generated synthetic data and evaluating on DAVIS-2017 and A2D Sentences datasets 1. Training on human vs synthetic referring expressions on the same videos 1. Ablation study
  • 37. Ablation Study ● Impact of Synthetic Referring Expression Information (DAVIS-2017) ● Freezing the language branch for synthetic pre-training
  • 39. 1. Pre-training a model using the synthetic referring expressions, when it is additionally trained on real ones, increases its ability to generalize across different datasets. 1. Gains are higher when no fine-tuning is performed on the target dataset 1. Synthetic referring expressions do not achieve better results than human-produced ones but can be used complementary without any additional annotation cost 1. More information in the referring expressions yields better segmentation accuracy Conclusions
  • 40. ● Extend the proposed method by adding more cues ○ Use scene-graph generation models to add relationships between objects Image from Xu et al. 2017 ● Apply the proposed method to other existing object detection/segmentation datasets ○ Create synthetic expressions for Microsoft COCO images to be used interchangeably with RefCOCO Future work