SlideShare a Scribd company logo
1 of 92
Download to read offline
Xavier Giro-i-Nieto
@DocXavi
xavier.giro@upc.edu
Associate Professor
Universitat Politècnica de Catalunya
Learning Representations
for Sign Language Videos
(our long way from TRECVID 2014)
UPC Intelligent Data Science
and Artificial Intelligence)
TRECVID 2021 Workshop
December 7, 2021
Virtual
Outline
Learning Representations for...
● Instance Search
● Videos
● Sign Language Videos
How it started
TRECVid 2014 @ Orlando
Instance Search
Figure: Eva Mohedano, “Visual Search with Deep Learning”. UPC 2018.
Visual Query
“This dog”
Expected outcome:
Instance Search
#BoW Mohedano, Eva, Kevin McGuinness, Noel E. O'Connor, Amaia Salvador, Ferran Marques, and Xavier Giró-i-Nieto. "Bags of local convolutional
features for scalable instance search." ICMR 2016. (best poster award)
Query image Top N retrieved images
Off the Shelf + Bag of CNN Words
#BoVW Mohedano, Eva, Kevin McGuinness, Noel E. O'Connor, Amaia Salvador, Ferran Marques, and Xavier Giró-i-Nieto. "Bags of local convolutional
features for scalable instance search." ICMR 2016.
(336x256)
Resolution
conv5_1 from
VGG16[1]
(42x32)
25K centroids 25K-D vector
Off the Shelf + Bag of Visual Words
Query Representation
... ... ...
... ... ...
Global Search
(GS)
Local Search
(LS)
Off the Shelf + Bag of Visual Words
#BoVW Mohedano, Eva, Kevin McGuinness, Noel E. O'Connor, Amaia Salvador, Ferran Marques, and Xavier Giró-i-Nieto. "Bags of local convolutional
features for scalable instance search." ICMR 2016.
Salvador, Amaia, Xavier Giró-i-Nieto, Ferran Marqués, and Shin'ichi Satoh. "Faster R-CNN features for instance search." CVPRW 2016.
Conv
layers
Region Proposal
Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
Image representation
Region Representation (for reranking only)
Fine-Tuning (FT) Faster R-CNN Features
Train object detector for query instances using query images as training data
Fine-Tuning (FT) Faster R-CNN Features
Salvador, Amaia, Xavier Giró-i-Nieto, Ferran Marqués, and Shin'ichi Satoh. "Faster R-CNN features for instance search." CVPRW 2016.
Image representation Region Representation
(for reranking)
RoI
Pooling
Conv5_3 RoI
Pooling
sum-pooling max-pooling
D
D
Fine-Tuning (FT) Faster R-CNN Features
Salvador, Amaia, Xavier Giró-i-Nieto, Ferran Marqués, and Shin'ichi Satoh. "Faster R-CNN features for instance search." CVPRW 2016.
Spatial Reranking (R) over Object Proposals
Query Image Target image in top M ranking
.
.
.
.
.
.
Fine-Tuning (FT) Faster R-CNN Features
Query Expansion (QE) with top N results
Image Representations
Query image
Image
Database
Image Matching Ranking List
v = (v1
, …, vn
)
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
Similarity
Metric
(cosine similarity)
...
Top N images
are added to the
query for a new
search
(N = 5)
Fine-Tuning (FT) Faster R-CNN Features
Salvador, Amaia, Xavier Giró-i-Nieto, Ferran Marqués, and Shin'ichi Satoh. "Faster R-CNN features for instance search." CVPRW 2016.
R: Local Reranking QE: Query Expansion FT: Fine-tuned
~10 % gain (+QE+R)
~35 % gain (FT + R + QE)
~20 % gain (FT)
Jiménez, Albert, Jose M. Alvarez, and Xavier Giró Nieto. "Class-weighted convolutional features for visual instance search." BMVC 2017.
Attention from Class Activation Maps
Jiménez, Albert, Jose M. Alvarez, and Xavier Giró Nieto. "Class-weighted convolutional features for visual instance search." BMVC 2017.
Attention from Class Activation Maps
Compact
Descriptor
Class-Weighted Convolutional Features
GAP .
.
.
.
.
.
w1
w2
w3
wN
Class 1
(Tennis Ball)
CAM layer
Class N
Reyes, Cristian, Eva Mohedano, Kevin McGuinness, Noel E. O'Connor, and Xavier Giro-i-Nieto. "Where is my phone? Personal object retrieval from
egocentric images." ACM M Workshops 2016.
Predicted
Human Visual Saliency
Attention from Human Visual Saliency
Attention from Human Visual Saliency
#SalBoW Mohedano, Eva, Kevin McGuinness, Xavier Giró-i-Nieto, and Noel E. O'Connor. "Saliency weighted convolutional features for instance
search." CBMI 2018.
Attention from Human Visual Saliency
#SalBoW Mohedano, Eva, Kevin McGuinness, Xavier Giró-i-Nieto, and Noel E. O'Connor. "Saliency weighted convolutional features for instance
search." CBMI 2018.
25K-D BoW vector
Unweighted Bow Weighted Bow
25K-D BoW vector
Attention from Human Visual Saliency
#SalGAN Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto.
“SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” CVPRW 2017.
Generator Discriminator
Attention from Human Visual Saliency
#SalBoW Mohedano, Eva, Kevin McGuinness, Xavier Giró-i-Nieto, and Noel E. O'Connor. "Saliency weighted convolutional features for instance
search." CBMI 2018.
Hand-crafted
saliency models
Deep-learning
based saliency
models
Outline
Learning Representations for...
● Instance Search
● Videos
○ Unimodal
○ Multimodal
● Sign Language Videos
Action Recognition
Alberto Montes, Amaia Salvador, Santiago Pascual, and Xavier Giro-i-Nieto. "Temporal Activity Detection in Untrimmed Videos with Recurrent
Neural Networks." NIPS Workshop 2016 (best poster award)
Ground Truth:
Hopscotch
Prediction:
0.848 Running a marathon
0.023 Triple jump
0.022 Javelin throw
Ground Truth:
Playing water polo
Prediction:
0.765 Playing water polo
0.202 Swimming
0.007 Springboard diving
C3D +RNN: Action Recognition
Alberto Montes, Amaia Salvador, Santiago Pascual, and Xavier Giro-i-Nieto. "Temporal Activity Detection in Untrimmed Videos with Recurrent
Neural Networks." NIPS Workshop 2016 (best poster award)
C3D+RNN: Activity Detection
Activity
CNN RNN
+
C3D: Online Detection of Action Start
#ODAS Shou, Zheng, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giro-i-Nieto, and Shih-Fu Chang.
"Online detection of action start in untrimmed, streaming videos." ECCV 2018.
Efficient C2D + RNN
CNN CNN CNN
...
RNN RNN RNN
...
#SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in
Recurrent Neural Networks”, ICLR 2018.
Efficient C2D + RNN
#SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in
Recurrent Neural Networks”, ICLR 2018.
S
x1
s1
x2
S
s1
time
x3
S
s3
COPY UPDATE
#SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in
Recurrent Neural Networks”, ICLR 2018.
Efficient C2D + RNN
~95% acc
Used
Unused
#SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in
Recurrent Neural Networks”, ICLR 2018.
Used
Unused
Efficient C2D + RNN
Efficient Visual Saliency Prediction
#SalEMA Linardos, Panagiotis, Eva Mohedano, Juan Jose Nieto, Noel E. O'Connor, Xavier Giro-i-Nieto, and Kevin McGuinness. "Simple vs complex
temporal recurrences for video saliency prediction." BMVC 2019.
SalCLSTM
SalEMA
#SalCLSTM #SalEMA Linardos, Panagiotis, Eva Mohedano, Juan Jose Nieto, Noel E. O'Connor, Xavier Giro-i-Nieto, and Kevin McGuinness. "Simple
vs complex temporal recurrences for video saliency prediction." BMVC 2019.
#SalEMA Linardos, Panagiotis, Eva Mohedano, Juan Jose Nieto, Noel E. O'Connor, Xavier Giro-i-Nieto, and Kevin McGuinness. "Simple vs complex
temporal recurrences for video saliency prediction." BMVC 2019.
Efficient Visual Saliency Prediction
SalCLSTM
SalEMA
C2D + RNN (space)
CNN
RNN
CNN
CNN
RNN
CNN
CNN
CNN
#RSIS Salvador, Amaia, Miriam Bellver, Victor Campos, Manel Baradad, Ferran Marques, Jordi Torres, Xavier Giro-i-Nieto.
"Recurrent neural networks for semantic instance segmentation." CVPR Workshops 2018.
C2D + RNN (time)
RNN RNN
time
…
RNN
#RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier
Giro-i-Nieto. “RVOS: End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019.
time
(frame sequence)
space
(object sequence)
C2D + RNN (space + time)
#RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier
Giro-i-Nieto. “RVOS: End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019.
C2D + RNN (space + time)
37
#RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS:
End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019.
Un
supervised
VOS
Zero-shot
VOS
#RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier
Giro-i-Nieto. “RVOS: End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019.
C2D + RNN (space + time)
One-shot Quality vs Inference Time for the Semi-supervised (one-shot) task
Speed values measured on a GPU K80 (*) and P100 (♱), otherwise obtained from YouTube-VOS paper..
C2D + RNN: Curriculum Learning
Schedule sampling
Gonzalez-i-Calabuig, Maria, Carles Ventura, and Xavier Giró-i-Nieto. "Curriculum Learning for Recurrent Video
Object Segmentation." CVPR Women in Computer Vision Workshop 2020.
Teacher forcing Without teacher forcing
C2D + RNN: Curriculum Learning
Frame Skipping
Gonzalez-i-Calabuig, Maria, Carles Ventura, and Xavier Giró-i-Nieto. "Curriculum Learning for Recurrent Video
Object Segmentation." CVPR Women in Computer Vision Workshop 2020.
All frames
C2D + RNN: Curriculum Learning
Gonzalez-i-Calabuig, Maria, Carles Ventura, and Xavier Giró-i-Nieto. "Curriculum Learning for Recurrent Video
Object Segmentation." CVPR Women in Computer Vision Workshop 2020.
C2D + RNN: Trajectory Estimation
#TrajE Girbau, Andreu, Xavier Giró-i-Nieto, Ignasi Rius, and Ferran Marqués. "Multiple Object Tracking with Mixture Density
Networks for Trajectory Estimation." CVPR Workshops 2021. (best paper runner-up)
C2D + RNN: Trajectory Estimation
#TrajE Girbau, Andreu, Xavier Giró-i-Nieto, Ignasi Rius, and Ferran Marqués. "Multiple Object Tracking with Mixture Density
Networks for Trajectory Estimation." CVPR Workshops 2021. (best paper runner-up)
RNN +
Displacement
(x, y)
C2D + RNN: Trajectory Estimation
#TrajE Girbau, Andreu, Xavier Giró-i-Nieto, Ignasi Rius, and Ferran Marqués. "Multiple Object Tracking with Mixture Density
Networks for Trajectory Estimation." CVPR Workshops 2021. (best paper runner-up)
Sample and keep alive multiple trajectory hypotheses
Displacement
(x, y)
Beam width = 2
RNN
RNN
C2D + RNN: Trajectory Estimation
#TrajE Girbau, Andreu, Xavier Giró-i-Nieto, Ignasi Rius, and Ferran Marqués. "Multiple Object Tracking with Mixture Density
Networks for Trajectory Estimation." CVPR Workshops 2021. (best paper runner-up)
MOTChallenge 17 - Testing set
Outline
Learning Representations for...
● Instance Search
● Videos
○ Unimodal
○ Multimodal
● Sign Language Videos
Encoder
Encoder
Representation
Encoder
Object Segmentation with Language
#RefVOS Bellver, Miriam, Carles Ventura, Carina Silberer, Ioannis Kazakos, Jordi Torres, and Xavier Giro-i-Nieto. "RefVOS: A
closer look at referring expressions for video object segmentation." arXiv preprint arXiv:2010.00263 (2020).
Object Segmentation with Language
Object Segmentation with Language
#SynthRef Kazakos, Ioannis, Carles Ventura, Miriam Bellver, Carina Silberer, and Xavier Giro-i-Nieto. "SynthRef:
Generation of Synthetic Referring Expressions for Object Segmentation." NAACL ViGIL Workshop 2021.
Accuracy on DAVIS 2017 train+val
Cross-modal Retrieval
Encoder Encoder
Representation
Cross-modal Video Retrieval
Amanda Duarte, Dídac Surís, Amaia Salvador, Jordi Torres, and Xavier Giró-i-Nieto. "Cross-modal
Embeddings for Video and Audio Retrieval." ECCV Women in Computer Vision Workshop 2018.
Cross-modal Video Retrieval
Best
match
Visual feature Audio feature
Amanda Duarte, Dídac Surís, Amaia Salvador, Jordi Torres, and Xavier Giró-i-Nieto. "Cross-modal
Embeddings for Video and Audio Retrieval." ECCV Women in Computer Vision Workshop 2018.
Cross-modal Video Retrieval
Best
match
Visual feature Audio feature
Amanda Duarte, Dídac Surís, Amaia Salvador, Jordi Torres, and Xavier Giró-i-Nieto. "Cross-modal
Embeddings for Video and Audio Retrieval." ECCV Women in Computer Vision Workshop 2018.
Multimodal Steganopgraphy
Encoder Encoder
Representation
Multimodal Steganopgraphy
Geleta, Margarita, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, and Xavier Giro-i-Nieto. "PixInWav: Residual
Steganography for Hiding Pixels in Audio." CVPR Women in Computer Vision Workshop 2021.
Multimodal Steganopgraphy
Geleta, Margarita, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, and Xavier Giro-i-Nieto. "PixInWav: Residual
Steganography for Hiding Pixels in Audio." CVPR Women in Computer Vision Workshop 2021.
Multimodal Steganopgraphy
Results produced by Teresa Domènech.
Geleta, Margarita, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, and Xavier Giro-i-Nieto. "PixInWav: Residual
Steganography for Hiding Pixels in Audio." CVPR Women in Computer Vision Workshop 2021.
Revealed image
Hidden image
Cross-modal Video Retrieval
Encoder
Encoder
Representation
Encoder
Cross-modal Video Retrieval
Oriol, B., Luque, J., Diego, F., & Giro-i-Nieto, X. (2020). Transcription-Enriched Joint Embeddings for Spoken
Descriptions of Images and Videos. CVPR 2020 EPIC Workshop.
“Dog”
Image
model
Speech
model
Cross-modal Video Retrieval
“Dog”
“Cat”
Image
model
Speech
model
Text model
Dog Cat
Oriol, B., Luque, J., Diego, F., & Giro-i-Nieto, X. (2020). Transcription-Enriched Joint Embeddings for Spoken
Descriptions of Images and Videos. CVPR 2020 EPIC Workshop.
Outline
Learning Representations for...
● Instance Search
● Videos
● Sign Language Videos
A crash course on Sign Language
Sign languages are NOT a one-to-one mapping from spoken languages.
Look-Up
Table
Hi, I’m Amelia and I’m
going to talk to you
about how to remove
gum from hair.
Sign Language
(video)
Spoken Language
(transcription)
��🏼
Sign-to-Spoken Language Tasks
SL Translation Hi, I’m Amelia and I’m going to talk to you
about how to remove gum from hair.
GIPHY/SIGNN WITH ROBERT
Isolated SL Recognition
Continuous SL Recognition
Finger-spelling
HI, ME FS-AMELIA WILL EXPLAIN
HOW REMOVE GUM FROM YOUR
HAIR
“I”
A, B, C, D...
Sign-to-Spoken Language Tasks
SL Translation Hi, I’m Amelia and I’m going to talk to you
about how to remove gum from hair.
Sign-Spoken Language Tasks
SL Production
SL Translation
Sign Language
(video)
65
Spoken Language
(transcription)
Hi, I’m Amelia and
I’m going to talk
to you about how
to remove gum
from hair.
Sign Language Translation & Production
End-to-end
Hi, I’m Amelia and I’m going
to talk to you about how to
remove gum from hair.
HI, ME FS-AMELIA WILL
EXPLAIN HOW REMOVE
GUM FROM YOUR HAIR
Speech
Spoken
transcription
Gloss
transcription
Sign
transcription
Video
3D
Poses
2D
Poses
Production
Translation
Segments
Challenges
67
Computer Vision
Speech
NLP
Training Data
Challenges in Computer Vision
68
Off-the-shelf pose detectors and generators struggle with hands.
69
��
Zhou, Yuxiao, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. "Monocular real-time
hand shape and motion capture using multi-modal data." CVPR 2020.
Challenges in Computer Vision
70
��
Weinzaepfel, Philippe, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, and Grégory Rogez. "Dope: Distillation of
part experts for whole-body 3d pose estimation in the wild." ECCV 2020.
Challenges in Computer Vision
71
��
Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end sign language
production." ECCV 2020.
Challenges in Computer Vision
72
��
Ng, Evonne, Shiry Ginosar, Trevor Darrell, and Hanbyul Joo. "Body2hands: Learning to infer 3d hands from
conversational gesture body dynamics." CVPR 2021.
Challenges in Computer Vision
Challenges
73
Computer Vision
Speech
NLP
Training Data
Challenges in NLP
Sign Languages are:
74
🤔
(Very) low-resource
languages…
...in a (very) high
dimensional space (video).
��🏼
��🏼
Challenges in NLP
75
Figure: TensorFlow tutorial
Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. "A neural probabilistic language model." Journal of machine learning
research 3, no. Feb (2003): 1137-1155.
🤔
What are “language
models” in sign
language ?
Challenges in NLP
76
How to transfer from
large pre-trained
(“foundation”) models ?
#GPT-3 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Agarwal, S. Language models
are few-shot learners. NeurIPS 2020 (best paper award).
Source: [OpenAI API]
English: My name is Barbara.
ASL: ME NAME fs-B-A-R-B-A-R-A.
English: Is he a teacher?
ASL: HE TEACHER HE
English: Amir is tall.
ASL: fs-A-M-I-R, HE TALL HE
English: I’m not sad.
ASL: ME SAD ME 🤔
Challenges
77
Computer Vision
Speech
NLP
Training Data
Challenges in Speech Translation
78
Jia, Ye, Michelle Tadmor Ramanovich, Tal Remez, and Roi Pomerantz. "Translatotron 2: Robust direct speech-to-speech
translation." arXiv preprint arXiv:2107.08661 (2021).
Speech Video
Speech Speech
End-to-end End-to-end
🤔
Challenges
79
Computer Vision
Speech
NLP
Training Data
Challenge in Sign Language Analytics
Computer Vision
Speech
NLP
Training Data
Giro-i-Nieto, X. “Open Challenges in Sign Language Translation & Production”. CMU VASC Seminar 2021.
Parallel Corpus
Fully supervised learning requires a large dataset of pairs of sentences in the two
languages to translate.
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning
phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014.
Sign Language Translation & Production
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
Body-face-hands keypoints
2D keypoints estimation from OpenPose [2]
Speech Signal
English Transcription
Hi, I’m Amelia and I’m going
to talk to you about how to
remove gum from hair.
Instructional videos
Multi-view VGA and HD videos [3]
Multi-view recordings (only for a subset)
3D keypoints
estimation
Gloss Annotation
HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR
Continuous Sign Language Datasets
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
Continuous Sign Language Datasets
Green Studio
Multi-view RGB videos
RGB-D videos
Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara,S.,
Sheikh, Y.: Panoptic studio: A massively multiview system for social motioncapture. In:
ICCV, 2015.
Panoptic Studio
Multi-view recordings (only for a subset)
Multi-view VGA and HD videos
Application: Human motion transfer
85
2D Pose
estimation
[Openpose]
GAN-
generated
[Everybody
dance now]
Application: Human motion transfer
86
Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign
language video generation from 2D poses." ECCV 2020 SLRTP Workshop.
Application: Human motion transfer
87
Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign
language video generation from 2D poses." ECCV 2020 SLRTP Workshop.
“Choose one category”
Skeleton
GAN-generated
Classification
accuracy
Application: Human motion transfer
88
Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign
language video generation from 2D poses." ECCV 2020 SLRTP Workshop.
Mean Opinion
Score
“How well could you understand the video?”
Skeleton
GAN-generated
Application: Human motion transfer
89
Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign
language video generation from 2D poses." ECCV 2020 SLRTP Workshop.
“Translate the ASL signs into written
English.”
Skeleton
GAN-generated
Challenge in Sign Language Analytics
Computer Vision
Speech
NLP
Training Data
Giro-i-Nieto, X. “Open Challenges in Sign Language Translation & Production”. CMU VASC Seminar 2021.
Outline
Learning Representations for...
● Instance Search
● Videos
● Sign Language Videos
Thank you
● @DocXavi
● xavier.giro@upc.edu
Eva
Mohedano
Victor
Campos
Miriam
Bellver
Amaia
Salvador
Andreu
Girbau
Amanda
Duarte
Carles
Ventura
Laia
Tarrés

More Related Content

What's hot

Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Universitat Politècnica de Catalunya
 
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Universitat Politècnica de Catalunya
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 

What's hot (20)

Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
 
Neural Architectures for Video Encoding
Neural Architectures for Video EncodingNeural Architectures for Video Encoding
Neural Architectures for Video Encoding
 
Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
 
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
 
One Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and VisionOne Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and Vision
 
Multimodal Deep Learning
Multimodal Deep LearningMultimodal Deep Learning
Multimodal Deep Learning
 
Deep Learning for Video: Language (UPC 2018)
Deep Learning for Video: Language (UPC 2018)Deep Learning for Video: Language (UPC 2018)
Deep Learning for Video: Language (UPC 2018)
 
Deep Learning for Video: Object Tracking (UPC 2018)
Deep Learning for Video: Object Tracking (UPC 2018)Deep Learning for Video: Object Tracking (UPC 2018)
Deep Learning for Video: Object Tracking (UPC 2018)
 
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
 
Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)
 
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
 

Similar to Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVID 2021

Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 
Threat Detection in Surveillance Videos
Threat Detection in Surveillance VideosThreat Detection in Surveillance Videos
Threat Detection in Surveillance VideosDatabricks
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learningNAVER Engineering
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018Universitat Politècnica de Catalunya
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaUniversitat Politècnica de Catalunya
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)Universitat Politècnica de Catalunya
 
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future DirectionsToru Tamaki
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...NAVER D2
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 

Similar to Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVID 2021 (20)

Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
 
Convolutional Features for Instance Search
Convolutional Features for Instance SearchConvolutional Features for Instance Search
Convolutional Features for Instance Search
 
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
 
Threat Detection in Surveillance Videos
Threat Detection in Surveillance VideosThreat Detection in Surveillance Videos
Threat Detection in Surveillance Videos
 
Perception and Quality of Immersive Media
Perception and Quality of Immersive MediaPerception and Quality of Immersive Media
Perception and Quality of Immersive Media
 
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
 
Once Perceptron to Rule Them all: Deep Learning for Multimedia
Once Perceptron to Rule Them all: Deep Learning for MultimediaOnce Perceptron to Rule Them all: Deep Learning for Multimedia
Once Perceptron to Rule Them all: Deep Learning for Multimedia
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
Video Object Linguistic Grounding
Video Object Linguistic GroundingVideo Object Linguistic Grounding
Video Object Linguistic Grounding
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
 
Learning with Unpaired Data
Learning with Unpaired DataLearning with Unpaired Data
Learning with Unpaired Data
 
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
 
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 

Recently uploaded

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 

Recently uploaded (20)

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVID 2021

  • 1. Xavier Giro-i-Nieto @DocXavi xavier.giro@upc.edu Associate Professor Universitat Politècnica de Catalunya Learning Representations for Sign Language Videos (our long way from TRECVID 2014) UPC Intelligent Data Science and Artificial Intelligence) TRECVID 2021 Workshop December 7, 2021 Virtual
  • 2. Outline Learning Representations for... ● Instance Search ● Videos ● Sign Language Videos
  • 3. How it started TRECVid 2014 @ Orlando
  • 4. Instance Search Figure: Eva Mohedano, “Visual Search with Deep Learning”. UPC 2018. Visual Query “This dog” Expected outcome:
  • 5. Instance Search #BoW Mohedano, Eva, Kevin McGuinness, Noel E. O'Connor, Amaia Salvador, Ferran Marques, and Xavier Giró-i-Nieto. "Bags of local convolutional features for scalable instance search." ICMR 2016. (best poster award) Query image Top N retrieved images
  • 6. Off the Shelf + Bag of CNN Words #BoVW Mohedano, Eva, Kevin McGuinness, Noel E. O'Connor, Amaia Salvador, Ferran Marques, and Xavier Giró-i-Nieto. "Bags of local convolutional features for scalable instance search." ICMR 2016. (336x256) Resolution conv5_1 from VGG16[1] (42x32) 25K centroids 25K-D vector
  • 7. Off the Shelf + Bag of Visual Words Query Representation ... ... ... ... ... ... Global Search (GS) Local Search (LS)
  • 8. Off the Shelf + Bag of Visual Words #BoVW Mohedano, Eva, Kevin McGuinness, Noel E. O'Connor, Amaia Salvador, Ferran Marques, and Xavier Giró-i-Nieto. "Bags of local convolutional features for scalable instance search." ICMR 2016.
  • 9. Salvador, Amaia, Xavier Giró-i-Nieto, Ferran Marqués, and Shin'ichi Satoh. "Faster R-CNN features for instance search." CVPRW 2016. Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals Image representation Region Representation (for reranking only) Fine-Tuning (FT) Faster R-CNN Features Train object detector for query instances using query images as training data
  • 10. Fine-Tuning (FT) Faster R-CNN Features Salvador, Amaia, Xavier Giró-i-Nieto, Ferran Marqués, and Shin'ichi Satoh. "Faster R-CNN features for instance search." CVPRW 2016. Image representation Region Representation (for reranking) RoI Pooling Conv5_3 RoI Pooling sum-pooling max-pooling D D
  • 11. Fine-Tuning (FT) Faster R-CNN Features Salvador, Amaia, Xavier Giró-i-Nieto, Ferran Marqués, and Shin'ichi Satoh. "Faster R-CNN features for instance search." CVPRW 2016. Spatial Reranking (R) over Object Proposals Query Image Target image in top M ranking . . . . . .
  • 12. Fine-Tuning (FT) Faster R-CNN Features Query Expansion (QE) with top N results Image Representations Query image Image Database Image Matching Ranking List v = (v1 , …, vn ) v1 = (v11 , …, v1n ) vk = (vk1 , …, vkn ) ... Similarity Metric (cosine similarity) ... Top N images are added to the query for a new search (N = 5)
  • 13. Fine-Tuning (FT) Faster R-CNN Features Salvador, Amaia, Xavier Giró-i-Nieto, Ferran Marqués, and Shin'ichi Satoh. "Faster R-CNN features for instance search." CVPRW 2016. R: Local Reranking QE: Query Expansion FT: Fine-tuned ~10 % gain (+QE+R) ~35 % gain (FT + R + QE) ~20 % gain (FT)
  • 14. Jiménez, Albert, Jose M. Alvarez, and Xavier Giró Nieto. "Class-weighted convolutional features for visual instance search." BMVC 2017. Attention from Class Activation Maps
  • 15. Jiménez, Albert, Jose M. Alvarez, and Xavier Giró Nieto. "Class-weighted convolutional features for visual instance search." BMVC 2017. Attention from Class Activation Maps Compact Descriptor Class-Weighted Convolutional Features GAP . . . . . . w1 w2 w3 wN Class 1 (Tennis Ball) CAM layer Class N
  • 16. Reyes, Cristian, Eva Mohedano, Kevin McGuinness, Noel E. O'Connor, and Xavier Giro-i-Nieto. "Where is my phone? Personal object retrieval from egocentric images." ACM M Workshops 2016. Predicted Human Visual Saliency Attention from Human Visual Saliency
  • 17. Attention from Human Visual Saliency #SalBoW Mohedano, Eva, Kevin McGuinness, Xavier Giró-i-Nieto, and Noel E. O'Connor. "Saliency weighted convolutional features for instance search." CBMI 2018.
  • 18. Attention from Human Visual Saliency #SalBoW Mohedano, Eva, Kevin McGuinness, Xavier Giró-i-Nieto, and Noel E. O'Connor. "Saliency weighted convolutional features for instance search." CBMI 2018. 25K-D BoW vector Unweighted Bow Weighted Bow 25K-D BoW vector
  • 19. Attention from Human Visual Saliency #SalGAN Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” CVPRW 2017. Generator Discriminator
  • 20. Attention from Human Visual Saliency #SalBoW Mohedano, Eva, Kevin McGuinness, Xavier Giró-i-Nieto, and Noel E. O'Connor. "Saliency weighted convolutional features for instance search." CBMI 2018. Hand-crafted saliency models Deep-learning based saliency models
  • 21. Outline Learning Representations for... ● Instance Search ● Videos ○ Unimodal ○ Multimodal ● Sign Language Videos
  • 22. Action Recognition Alberto Montes, Amaia Salvador, Santiago Pascual, and Xavier Giro-i-Nieto. "Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks." NIPS Workshop 2016 (best poster award) Ground Truth: Hopscotch Prediction: 0.848 Running a marathon 0.023 Triple jump 0.022 Javelin throw Ground Truth: Playing water polo Prediction: 0.765 Playing water polo 0.202 Swimming 0.007 Springboard diving
  • 23. C3D +RNN: Action Recognition Alberto Montes, Amaia Salvador, Santiago Pascual, and Xavier Giro-i-Nieto. "Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks." NIPS Workshop 2016 (best poster award)
  • 25. C3D: Online Detection of Action Start #ODAS Shou, Zheng, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giro-i-Nieto, and Shih-Fu Chang. "Online detection of action start in untrimmed, streaming videos." ECCV 2018.
  • 26. Efficient C2D + RNN CNN CNN CNN ... RNN RNN RNN ... #SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018.
  • 27. Efficient C2D + RNN #SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018. S x1 s1 x2 S s1 time x3 S s3 COPY UPDATE
  • 28. #SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018. Efficient C2D + RNN ~95% acc Used Unused
  • 29. #SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018. Used Unused Efficient C2D + RNN
  • 30. Efficient Visual Saliency Prediction #SalEMA Linardos, Panagiotis, Eva Mohedano, Juan Jose Nieto, Noel E. O'Connor, Xavier Giro-i-Nieto, and Kevin McGuinness. "Simple vs complex temporal recurrences for video saliency prediction." BMVC 2019. SalCLSTM SalEMA
  • 31. #SalCLSTM #SalEMA Linardos, Panagiotis, Eva Mohedano, Juan Jose Nieto, Noel E. O'Connor, Xavier Giro-i-Nieto, and Kevin McGuinness. "Simple vs complex temporal recurrences for video saliency prediction." BMVC 2019.
  • 32. #SalEMA Linardos, Panagiotis, Eva Mohedano, Juan Jose Nieto, Noel E. O'Connor, Xavier Giro-i-Nieto, and Kevin McGuinness. "Simple vs complex temporal recurrences for video saliency prediction." BMVC 2019. Efficient Visual Saliency Prediction SalCLSTM SalEMA
  • 33. C2D + RNN (space) CNN RNN CNN CNN RNN CNN CNN CNN #RSIS Salvador, Amaia, Miriam Bellver, Victor Campos, Manel Baradad, Ferran Marques, Jordi Torres, Xavier Giro-i-Nieto. "Recurrent neural networks for semantic instance segmentation." CVPR Workshops 2018.
  • 34. C2D + RNN (time) RNN RNN time … RNN
  • 35. #RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS: End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019. time (frame sequence) space (object sequence) C2D + RNN (space + time)
  • 36. #RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS: End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019. C2D + RNN (space + time)
  • 37. 37 #RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS: End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019. Un supervised VOS Zero-shot VOS
  • 38. #RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS: End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019. C2D + RNN (space + time) One-shot Quality vs Inference Time for the Semi-supervised (one-shot) task Speed values measured on a GPU K80 (*) and P100 (♱), otherwise obtained from YouTube-VOS paper..
  • 39. C2D + RNN: Curriculum Learning Schedule sampling Gonzalez-i-Calabuig, Maria, Carles Ventura, and Xavier Giró-i-Nieto. "Curriculum Learning for Recurrent Video Object Segmentation." CVPR Women in Computer Vision Workshop 2020. Teacher forcing Without teacher forcing
  • 40. C2D + RNN: Curriculum Learning Frame Skipping Gonzalez-i-Calabuig, Maria, Carles Ventura, and Xavier Giró-i-Nieto. "Curriculum Learning for Recurrent Video Object Segmentation." CVPR Women in Computer Vision Workshop 2020. All frames
  • 41. C2D + RNN: Curriculum Learning Gonzalez-i-Calabuig, Maria, Carles Ventura, and Xavier Giró-i-Nieto. "Curriculum Learning for Recurrent Video Object Segmentation." CVPR Women in Computer Vision Workshop 2020.
  • 42. C2D + RNN: Trajectory Estimation #TrajE Girbau, Andreu, Xavier Giró-i-Nieto, Ignasi Rius, and Ferran Marqués. "Multiple Object Tracking with Mixture Density Networks for Trajectory Estimation." CVPR Workshops 2021. (best paper runner-up)
  • 43. C2D + RNN: Trajectory Estimation #TrajE Girbau, Andreu, Xavier Giró-i-Nieto, Ignasi Rius, and Ferran Marqués. "Multiple Object Tracking with Mixture Density Networks for Trajectory Estimation." CVPR Workshops 2021. (best paper runner-up) RNN + Displacement (x, y)
  • 44. C2D + RNN: Trajectory Estimation #TrajE Girbau, Andreu, Xavier Giró-i-Nieto, Ignasi Rius, and Ferran Marqués. "Multiple Object Tracking with Mixture Density Networks for Trajectory Estimation." CVPR Workshops 2021. (best paper runner-up) Sample and keep alive multiple trajectory hypotheses Displacement (x, y) Beam width = 2 RNN RNN
  • 45. C2D + RNN: Trajectory Estimation #TrajE Girbau, Andreu, Xavier Giró-i-Nieto, Ignasi Rius, and Ferran Marqués. "Multiple Object Tracking with Mixture Density Networks for Trajectory Estimation." CVPR Workshops 2021. (best paper runner-up) MOTChallenge 17 - Testing set
  • 46. Outline Learning Representations for... ● Instance Search ● Videos ○ Unimodal ○ Multimodal ● Sign Language Videos
  • 48. #RefVOS Bellver, Miriam, Carles Ventura, Carina Silberer, Ioannis Kazakos, Jordi Torres, and Xavier Giro-i-Nieto. "RefVOS: A closer look at referring expressions for video object segmentation." arXiv preprint arXiv:2010.00263 (2020). Object Segmentation with Language
  • 49. Object Segmentation with Language #SynthRef Kazakos, Ioannis, Carles Ventura, Miriam Bellver, Carina Silberer, and Xavier Giro-i-Nieto. "SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation." NAACL ViGIL Workshop 2021. Accuracy on DAVIS 2017 train+val
  • 51. Cross-modal Video Retrieval Amanda Duarte, Dídac Surís, Amaia Salvador, Jordi Torres, and Xavier Giró-i-Nieto. "Cross-modal Embeddings for Video and Audio Retrieval." ECCV Women in Computer Vision Workshop 2018.
  • 52. Cross-modal Video Retrieval Best match Visual feature Audio feature Amanda Duarte, Dídac Surís, Amaia Salvador, Jordi Torres, and Xavier Giró-i-Nieto. "Cross-modal Embeddings for Video and Audio Retrieval." ECCV Women in Computer Vision Workshop 2018.
  • 53. Cross-modal Video Retrieval Best match Visual feature Audio feature Amanda Duarte, Dídac Surís, Amaia Salvador, Jordi Torres, and Xavier Giró-i-Nieto. "Cross-modal Embeddings for Video and Audio Retrieval." ECCV Women in Computer Vision Workshop 2018.
  • 55. Multimodal Steganopgraphy Geleta, Margarita, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, and Xavier Giro-i-Nieto. "PixInWav: Residual Steganography for Hiding Pixels in Audio." CVPR Women in Computer Vision Workshop 2021.
  • 56. Multimodal Steganopgraphy Geleta, Margarita, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, and Xavier Giro-i-Nieto. "PixInWav: Residual Steganography for Hiding Pixels in Audio." CVPR Women in Computer Vision Workshop 2021.
  • 57. Multimodal Steganopgraphy Results produced by Teresa Domènech. Geleta, Margarita, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, and Xavier Giro-i-Nieto. "PixInWav: Residual Steganography for Hiding Pixels in Audio." CVPR Women in Computer Vision Workshop 2021. Revealed image Hidden image
  • 59. Cross-modal Video Retrieval Oriol, B., Luque, J., Diego, F., & Giro-i-Nieto, X. (2020). Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and Videos. CVPR 2020 EPIC Workshop. “Dog” Image model Speech model
  • 60. Cross-modal Video Retrieval “Dog” “Cat” Image model Speech model Text model Dog Cat Oriol, B., Luque, J., Diego, F., & Giro-i-Nieto, X. (2020). Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and Videos. CVPR 2020 EPIC Workshop.
  • 61. Outline Learning Representations for... ● Instance Search ● Videos ● Sign Language Videos
  • 62. A crash course on Sign Language Sign languages are NOT a one-to-one mapping from spoken languages. Look-Up Table Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Sign Language (video) Spoken Language (transcription) ��🏼
  • 63. Sign-to-Spoken Language Tasks SL Translation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. GIPHY/SIGNN WITH ROBERT Isolated SL Recognition Continuous SL Recognition Finger-spelling HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR “I” A, B, C, D...
  • 64. Sign-to-Spoken Language Tasks SL Translation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 65. Sign-Spoken Language Tasks SL Production SL Translation Sign Language (video) 65 Spoken Language (transcription) Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 66. Sign Language Translation & Production End-to-end Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR Speech Spoken transcription Gloss transcription Sign transcription Video 3D Poses 2D Poses Production Translation Segments
  • 68. Challenges in Computer Vision 68 Off-the-shelf pose detectors and generators struggle with hands.
  • 69. 69 �� Zhou, Yuxiao, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. "Monocular real-time hand shape and motion capture using multi-modal data." CVPR 2020. Challenges in Computer Vision
  • 70. 70 �� Weinzaepfel, Philippe, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, and Grégory Rogez. "Dope: Distillation of part experts for whole-body 3d pose estimation in the wild." ECCV 2020. Challenges in Computer Vision
  • 71. 71 �� Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end sign language production." ECCV 2020. Challenges in Computer Vision
  • 72. 72 �� Ng, Evonne, Shiry Ginosar, Trevor Darrell, and Hanbyul Joo. "Body2hands: Learning to infer 3d hands from conversational gesture body dynamics." CVPR 2021. Challenges in Computer Vision
  • 74. Challenges in NLP Sign Languages are: 74 🤔 (Very) low-resource languages… ...in a (very) high dimensional space (video). ��🏼 ��🏼
  • 75. Challenges in NLP 75 Figure: TensorFlow tutorial Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. "A neural probabilistic language model." Journal of machine learning research 3, no. Feb (2003): 1137-1155. 🤔 What are “language models” in sign language ?
  • 76. Challenges in NLP 76 How to transfer from large pre-trained (“foundation”) models ? #GPT-3 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Agarwal, S. Language models are few-shot learners. NeurIPS 2020 (best paper award). Source: [OpenAI API] English: My name is Barbara. ASL: ME NAME fs-B-A-R-B-A-R-A. English: Is he a teacher? ASL: HE TEACHER HE English: Amir is tall. ASL: fs-A-M-I-R, HE TALL HE English: I’m not sad. ASL: ME SAD ME 🤔
  • 78. Challenges in Speech Translation 78 Jia, Ye, Michelle Tadmor Ramanovich, Tal Remez, and Roi Pomerantz. "Translatotron 2: Robust direct speech-to-speech translation." arXiv preprint arXiv:2107.08661 (2021). Speech Video Speech Speech End-to-end End-to-end 🤔
  • 80. Challenge in Sign Language Analytics Computer Vision Speech NLP Training Data Giro-i-Nieto, X. “Open Challenges in Sign Language Translation & Production”. CMU VASC Seminar 2021.
  • 81. Parallel Corpus Fully supervised learning requires a large dataset of pairs of sentences in the two languages to translate. Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014.
  • 82. Sign Language Translation & Production Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021. Body-face-hands keypoints 2D keypoints estimation from OpenPose [2] Speech Signal English Transcription Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Instructional videos Multi-view VGA and HD videos [3] Multi-view recordings (only for a subset) 3D keypoints estimation Gloss Annotation HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR
  • 83. Continuous Sign Language Datasets Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 84. Continuous Sign Language Datasets Green Studio Multi-view RGB videos RGB-D videos Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara,S., Sheikh, Y.: Panoptic studio: A massively multiview system for social motioncapture. In: ICCV, 2015. Panoptic Studio Multi-view recordings (only for a subset) Multi-view VGA and HD videos
  • 85. Application: Human motion transfer 85 2D Pose estimation [Openpose] GAN- generated [Everybody dance now]
  • 86. Application: Human motion transfer 86 Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign language video generation from 2D poses." ECCV 2020 SLRTP Workshop.
  • 87. Application: Human motion transfer 87 Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign language video generation from 2D poses." ECCV 2020 SLRTP Workshop. “Choose one category” Skeleton GAN-generated Classification accuracy
  • 88. Application: Human motion transfer 88 Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign language video generation from 2D poses." ECCV 2020 SLRTP Workshop. Mean Opinion Score “How well could you understand the video?” Skeleton GAN-generated
  • 89. Application: Human motion transfer 89 Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign language video generation from 2D poses." ECCV 2020 SLRTP Workshop. “Translate the ASL signs into written English.” Skeleton GAN-generated
  • 90. Challenge in Sign Language Analytics Computer Vision Speech NLP Training Data Giro-i-Nieto, X. “Open Challenges in Sign Language Translation & Production”. CMU VASC Seminar 2021.
  • 91. Outline Learning Representations for... ● Instance Search ● Videos ● Sign Language Videos
  • 92. Thank you ● @DocXavi ● xavier.giro@upc.edu Eva Mohedano Victor Campos Miriam Bellver Amaia Salvador Andreu Girbau Amanda Duarte Carles Ventura Laia Tarrés