SlideShare a Scribd company logo
1 of 67
Download to read offline
[http://pagines.uab.cat/mcv/]
Module 6 - Day 2 - Lecture 1
Neural Architectures for
Video Encoding
27th February 2020
Xavier Giro-i-Nieto
@DocXavi
xavier.giro@upc.edu
Associate Professor
Universitat Politècnica de Catalunya
Barcelona Supercomputing Center
Acknowledgements
2
Víctor Campos Alberto MontesAmaia Salvador Santiago Pascual
Video lectures
3
Víctor Campos
Deep Learning for Computer Vision 2018
UPC TelecomBCN
Xavier Giro-i-Nieto
Deep Learning for Computer Vision 2019
UPC TelecomBCN
4
#DeepVideo Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale
video classification with convolutional neural networks. CVPR 2014
Motivation
5
#DeepVideo Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. Large-scale video classification
with convolutional neural networks. CVPR 2014.
What is a video?
6
● Formally, a video is a 3D signal
○ Spatial coordinates: x, y
○ Temporal coordinate: t
● If we fix t, we obtain an image. We can understand
videos as sequences of images (a.k.a. frames)
Convolutional Neural Networks (CNN) provide state of the
art performance on still image analysis tasks
How do we work with images?
7
How can we extend CNNs to image sequences?
How do we work with videos ?
8
f ŷ
9
Basic deep architectures for video:
1. Single frame models
2. Spatio-temporal convolutions
3. CNN + RNN
4. RGB + Optical Flow
5. Dual slow and fast frame rates
6. Capsule Networks
Deep Video Architectures
10
Basic deep architectures for video:
1. Single frame models
2. Spatio-temporal convolutions
3. CNN + RNN
4. RGB + Optical Flow
5. Dual slow and fast frame rates
6. Capsule Networks
Deep Video Architectures
Single frame models
11
CNN CNN CNN...
Combination method
Combination is commonly
implemented as a small NN on
top of a temporal pooling
operation (e.g. max, average).
Problem: pooling is not aware
of the temporal order!
Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and
George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015
ŷ
12
#DeepVideo Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale
video classification with convolutional neural networks. CVPR 2014
Single frame models
13
Single frame models
#DeepVideo Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale
video classification with convolutional neural networks. CVPR 2014
14
#DeepVideo Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale
video classification with convolutional neural networks. CVPR 2014
Single frame models
15
Basic deep architectures for video:
1. Single frame models
2. Spatio-temporal convolutions
3. CNN + RNN
4. RGB + Optical Flow Two-stream CNN
Deep Video Architectures
3D CNN (C3D)
16
We can add an extra dimension to standard 2D CNN filters:
● A grayscale video clip may be represented by a tensor of size H x W x L.
● A 3D conv filter for the 1st conv layer may be of size k x k x d
#C3D Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features
with 3D convolutional networks" ICCV 2015
time
17
The video needs to be split into chunks (also known as clips) with a number of
frames that fits the receptive field of the C3D (eg. 16 frames).
#C3D Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features
with 3D convolutional networks" ICCV 2015
16-frame clip
16-frame clip
16-frame clip
...
Average
pooling
4096-dimvideodescriptor
4096-dimvideodescriptor
L2 norm
3D CNN (C3D) + temporal pooling
18
3D CNN (C3D)
#3D Resnet Hara, Kensho, Hirokatsu Kataoka, and Yutaka Satoh. "Can spatiotemporal 3d cnns retrace the history of 2d
cnns and imagenet?." CVPR 2018. [code]
Much larger datasets (eg. Kinetics) allow training much deeper C3D CNNs.
19
Limitation:
● How can we use pre-trained 2D networks to initialize C3D training ?
3D CNN
20
Inflated 3D weights (I3D)
We can adapt C2D filters to C3D by replicating them along the temporal axis
(and scaling its values by 1/d).
inflation
#I3D Carreira, J., & Zisserman, A. . Quo vadis, action recognition? A new model and the kinetics dataset. CVPR 2017. [code]
k x k
C2D filter I3D filter
k x k x d
Filter inflation allows reusing C2D pre-trained filters (eg. from ImageNet) to
initialize C3D filters.
21
Residual Learning
Residual learning: reformulate the layers as learning residual functions with
respect to the identity F(x), instead of learning unreferenced functions H(x)
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." CVPR 2016. [slides]
Plain Net Residual Net
22
#Res3D Tran, Du, Jamie Ray, Zheng Shou, Shih-Fu Chang, and Manohar Paluri. "Convnet architecture
search for spatiotemporal feature learning." arXiv preprint arXiv:1708.05038 (2017). [code]
3D CNN + Residual (Res3D)
2D ResNet Res3D
23
#Res3D Tran, Du, Jamie Ray, Zheng Shou, Shih-Fu Chang, and Manohar Paluri. "Convnet architecture
search for spatiotemporal feature learning." arXiv preprint arXiv:1708.05038 (2017). [code]
3D CNN + Residual (Res3D)
24
#Res3D Tran, Du, Jamie Ray, Zheng Shou, Shih-Fu Chang, and Manohar Paluri. "Convnet architecture
search for spatiotemporal feature learning." arXiv preprint arXiv:1708.05038 (2017). [code]
3D CNN + Residual (Res3D)
Gain in performance comes with the cost of much more computation and larger
memory footprint.
25
#MC Tran, Du, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. "A closer
look at spatiotemporal convolutions for action recognition." CVPR 2018. [code]
Mixed Convolutions (MC & rMC)
Mixed Convolutions (MC)
● Early layers: 3D convolutions
● Top layers: 2D convolutions
Reversed Mixed Convolutions (rMC)
● Early layers: 2D convolutions
● Top layers: 3D convolutions
26
#MC Tran, Du, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. "A closer
look at spatiotemporal convolutions for action recognition." CVPR 2018. [code]
MC ResNets:
● 3-4% gain in clip-level accuracy over 2D ResNets of comparable capacity
● match the performance of 3D ResNets, which have 3 times as many parameters.
Mixed Convolutions (MC & rMC)
27
#R(2+1)D Tran, Du, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. "A
closer look at spatiotemporal convolutions for action recognition." CVPR 2018. [code]
C(2+1)D
Splits the computation into a spatial 2D convolution followed by a temporal 1D
convolution:. In the case of an input of a single feature channel:
C3D C(2+1)D
Number of 2D filters
28
#R(2+1)D Tran, Du, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. "A
closer look at spatiotemporal convolutions for action recognition." CVPR 2018. [code]
C(2+1D) + Residual
29
#R(2+1)D Tran, Du, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. "A
closer look at spatiotemporal convolutions for action recognition." CVPR 2018. [code]
2+1D CNN + Residual
30
Separable 2D (group) convolutions
Depth-wise (in channels) separable convolutions.
#XCeption Chollet, François. "Xception: Deep learning with depthwise separable convolutions." CVPR 2017.
Figure: Sik-Ho Tsang
31
Res3D + Separable (group) convolutions
#CSN D. Tran, H. Wang, L. Torresani and M. Feiszli. Video Classification with Channel-Separated
Convolutional Networks. ICCV 2019. [Caffe2]
Channel interactions and spatiotemporal interactions are separated.
1 group 2 groups
# groups =
# input channels
32
Basic deep architectures for video:
1. Single frame models
2. Spatio-temporal convolutions
3. CNN + RNN
4. RGB + Optical Flow
5. Dual slow and fast frame rates
6. Capsule Networks
Deep Video Architectures
33
The hidden layers and the
output depend from previous
states of the hidden layers
Recurrent Neural Network (RNN)
2D CNN + RNN
34
CNN CNN CNN...
Recurrent Neural Networks
are well suited for processing
sequences.
Problem: RNNs are sequential
and cannot be parallelized.
RNN RNN RNN...
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko,
Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code
Videolectures on RNNs:
DLSL 2017, “RNN (I)”
“RNN (II)”
DLAI 2018, “RNN”
35
#Res3D Tran, Du, Jamie Ray, Zheng Shou, Shih-Fu Chang, and Manohar Paluri. "Convnet architecture
search for spatiotemporal feature learning." arXiv preprint arXiv:1708.05038 (2017). [code]
2D CNN + RNN vs Spatio-temporal Convs
36
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko,
Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. [code]
2D CNN + RNN
37
Limitations:
● How can we handle longer videos?
● How can we capture longer temporal dependencies?
3D CNN
38
3D CNN + RNN
A. Montes, Salvador, A., Pascual-deLaPuente, S., and Giró-i-Nieto, X., “Temporal Activity Detection in Untrimmed
Videos with Recurrent Neural Networks”, NIPS Workshop 2016 (best poster award)
39
#SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip
RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018.
2D CNN + RNN: skip redundancy
CNN CNN CNN...
RNN RNN RNN... After processing a frame, let
the RNN decide how many
future frames can be skipped
In skipped frames, simply
copy the output and state
from the previous time step
There is no ground truth for
which frames can be skipped.
The RNN learns it by itself
during training!
40
#SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip
RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018.
2D CNN + RNN
Epochs for Skip LSTM (λ = 10-4
)
~30% acc ~50% acc ~70% acc ~95% acc
11 epochs 51 epochs 101 epochs 400 epochs
Used
Unused
41
Used
Unused
2D CNN + RNN: skip redundancy
#SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip
RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018.
42
Basic deep architectures for video:
1. Single frame models
2. CNN + RNN
3. 3D convolutions
4. RGB + Optical Flow
5. Dual slow and fast frame rates
6. Capsule Networks
Deep Video Architectures
43
Ilg, Eddy, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. "Flownet 2.0: Evolution of
optical flow estimation with deep networks." CVPR 2017. [code]
Optical flow
44
Popular implementations for optical flow:
#Coarse2Fine Brox, Thomas, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. "High accuracy optical flow estimation
based on a theory for warping." ECCV 2004.
● PyFlow
● FlowNet2
● Improved Dense Trajectories - IDT
● Lucas-Kanade (OpenCV)
● Farneback 2003 (OpenCV)
● Middlebury
● Horn and schunk
● Tikhonov regularized and vectorized
● DeepFlow (2013)
● (...)
Optical flow
45
Deep Neural Networks have actually also been trained to predict optical flow:
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., FlowNet:
Learning Optical Flow With Convolutional Networks. ICCV 2015
Two-streams 2D CNNs
46
Problem: Single frame models do not take into account motion in videos.
Solution: extract optical flow for a stack of frames and use it as an input to a
CNN.
Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in
videos." NIPS 2014.
Fusion
Two-streams 2D CNNs
47Feichtenhofer, Christoph, Axel Pinz, and Andrew Zisserman. "Convolutional two-stream network fusion for video action
recognition." CVPR 2016. [code]
Fusion
Problem: Single frame models do not take into account motion in videos.
Solution: extract optical flow for a stack of frames and use it as an input to a
CNN.
Two-streams 2D CNNs + Residual
48
Feichtenhofer, Christoph, Axel Pinz, and Richard Wildes. "Spatiotemporal residual networks for video action recognition."
NIPS 2016. [code]
Problem: Single frame models do not take into account motion in videos.
Solution: Extract optical flow for a stack of frames and feed it features in the
appearance stream.
Two-streams + Inflated 3D
49#I3D Carreira, J., & Zisserman, A. . Quo vadis, action recognition? A new model and the kinetics dataset. CVPR 2017. [code]
Two-streams 3D CNNs
50#I3D Carreira, J., & Zisserman, A. . Quo vadis, action recognition? A new model and the kinetics dataset. CVPR 2017. [code]
Two-streams 2D CNNs + RNN
51
Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici.
"Beyond short snippets: Deep networks for video classification." CVPR 2015
One stream RGB + Optical Flow
52
#2in1 Zhao, Jiaojiao, and Cees GM Snoek. "Dance with Flow: Two-in-One Stream Action Detection." CVPR 2019.
Leveraging the motion condition to modulate RGB features improves
detection accuracy.
One stream RGB + Optical Flow
53
#2in1 Zhao, Jiaojiao, and Cees GM Snoek. "Dance with Flow: Two-in-One Stream Action Detection." CVPR 2019.
One stream RGB + Optical Flow
54
#2in1 Zhao, Jiaojiao, and Cees GM Snoek. "Dance with Flow: Two-in-One Stream Action Detection." CVPR 2019.
2in1 has:
● half the computation and parameters of a two-stream equivalent
● better action detection accuracy (but worse for action classification).
Two-streams (feature visualization)
55
Feichtenhofer, Christoph, Axel Pinz, Richard P. Wildes, and Andrew Zisserman. "What have we learned from deep
representations for action recognition?." CVPR 2018.
Feichtenhofer, Christoph, Axel Pinz, Richard P. Wildes, and Andrew Zisserman. "Deep Insights into Convolutional Networks
for Video Recognition." IJCV 2019.
Replacing C3D for C2D
56#S3D Xie, S., Sun, C., Huang, J., Tu, Z., & Murphy, K.. Rethinking spatiotemporal feature learning: Speed-accuracy
trade-offs in video classification. ECCV 2018
57
Basic deep architectures for video:
1. Single frame models
2. CNN + RNN
3. 3D convolutions
4. RGB + Optical Flow
5. Dual slow and fast frame rates
6. Capsule Networks
Deep Video Architectures
58
Dual slow and fast frame rates
#SlowFast Feichtenhofer, Christoph, Haoqi Fan, Jitendra Malik, and Kaiming He. "Slowfast networks for video
recognition." ICCV 2019. [blog] [code]
2 fps
16 fps
High capacity model
Low capacity model
59
Dual slow and fast frame rates
#SlowFast Feichtenhofer, Christoph, Haoqi Fan, Jitendra Malik, and Kaiming He. "Slowfast networks for video recognition."
ICCV 2019. [blog] [code]
Comparison with the state of the art on Kinetics-400
60
Dual slow and fast frame rates
#AVSlowFast Xiao, Fanyi, Yong Jae Lee, Kristen Grauman, Jitendra Malik, and Christoph Feichtenhofer. "Audiovisual
SlowFast Networks for Video Recognition." arXiv preprint arXiv:2001.08740 (2020).
61
Basic deep architectures for video:
1. Single frame models
2. CNN + RNN
3. 3D convolutions
4. RGB + Optical Flow Two-stream CNN
5. Dual slow and fast frame rates
6. Capsule Networks
Deep Video Architectures
62
Capsule Networks
#Videocapsulenet Duarte, Kevin, Yogesh Rawat, and Mubarak Shah. "Videocapsulenet: A simplified network for action
detection." NeurIPS 2018. [code]
63
Basic deep architectures for video:
1. Single frame models
2. CNN + RNN
3. 3D convolutions
4. RGB + Optical Flow
5. Dual slow and fast frame rates
6. Capsule Networks
Deep Video Architectures
64
Implementations
PyTorch
● Facebook FAIR
○ SlowFast
○ SlowOnly
○ C2D
○ I3D
○ Non-local Network
● Torchvision
○ ResNet 3D 18
○ ResNet MC 18
○ ResNet(2+1)D
65
Learn more
Rohit Ghosh, “Deep Learning for Videos: A 2018 Guide for Action recognition” (2018)
66
Related lectures [index]
Object tracking
[slides] [video]
Video Object segmentation
[slides] [video]
Self-supervised Learning from
Video Sequences [slides] [video]
Self-supervised Audiovisual
Learning [slides] [video]
67
Questions ?

More Related Content

What's hot

What's hot (20)

Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
 
Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
 
Deep Learning for Video: Language (UPC 2018)
Deep Learning for Video: Language (UPC 2018)Deep Learning for Video: Language (UPC 2018)
Deep Learning for Video: Language (UPC 2018)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 
One Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and VisionOne Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and Vision
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning for Video: Object Tracking (UPC 2018)
Deep Learning for Video: Object Tracking (UPC 2018)Deep Learning for Video: Object Tracking (UPC 2018)
Deep Learning for Video: Object Tracking (UPC 2018)
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
 
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
 
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Learning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksLearning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networks
 

Similar to Neural Architectures for Video Encoding

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
CSCJournals
 
Image Compression Using Hybrid Svd Wdr And Svd Aswdr
Image Compression Using Hybrid Svd Wdr And Svd AswdrImage Compression Using Hybrid Svd Wdr And Svd Aswdr
Image Compression Using Hybrid Svd Wdr And Svd Aswdr
Melanie Smith
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 

Similar to Neural Architectures for Video Encoding (20)

Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
 
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
 
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
IRJET-  	  Identification of Missing Person in the Crowd using Pretrained Neu...IRJET-  	  Identification of Missing Person in the Crowd using Pretrained Neu...
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
 
1918 1923
1918 19231918 1923
1918 1923
 
1918 1923
1918 19231918 1923
1918 1923
 
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
 
A0540106
A0540106A0540106
A0540106
 
U_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthU_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in Depth
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
 
Hierarchical structure adaptive
Hierarchical structure adaptiveHierarchical structure adaptive
Hierarchical structure adaptive
 
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMVIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
 
Steganography using Coefficient Replacement and Adaptive Scaling based on DTCWT
Steganography using Coefficient Replacement and Adaptive Scaling based on DTCWTSteganography using Coefficient Replacement and Adaptive Scaling based on DTCWT
Steganography using Coefficient Replacement and Adaptive Scaling based on DTCWT
 
Image Compression Using Hybrid Svd Wdr And Svd Aswdr
Image Compression Using Hybrid Svd Wdr And Svd AswdrImage Compression Using Hybrid Svd Wdr And Svd Aswdr
Image Compression Using Hybrid Svd Wdr And Svd Aswdr
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 

More from Universitat Politècnica de Catalunya

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
 

Recently uploaded

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 

Neural Architectures for Video Encoding

  • 1. [http://pagines.uab.cat/mcv/] Module 6 - Day 2 - Lecture 1 Neural Architectures for Video Encoding 27th February 2020 Xavier Giro-i-Nieto @DocXavi xavier.giro@upc.edu Associate Professor Universitat Politècnica de Catalunya Barcelona Supercomputing Center
  • 2. Acknowledgements 2 Víctor Campos Alberto MontesAmaia Salvador Santiago Pascual
  • 3. Video lectures 3 Víctor Campos Deep Learning for Computer Vision 2018 UPC TelecomBCN Xavier Giro-i-Nieto Deep Learning for Computer Vision 2019 UPC TelecomBCN
  • 4. 4 #DeepVideo Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video classification with convolutional neural networks. CVPR 2014
  • 5. Motivation 5 #DeepVideo Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. Large-scale video classification with convolutional neural networks. CVPR 2014.
  • 6. What is a video? 6 ● Formally, a video is a 3D signal ○ Spatial coordinates: x, y ○ Temporal coordinate: t ● If we fix t, we obtain an image. We can understand videos as sequences of images (a.k.a. frames)
  • 7. Convolutional Neural Networks (CNN) provide state of the art performance on still image analysis tasks How do we work with images? 7
  • 8. How can we extend CNNs to image sequences? How do we work with videos ? 8 f ŷ
  • 9. 9 Basic deep architectures for video: 1. Single frame models 2. Spatio-temporal convolutions 3. CNN + RNN 4. RGB + Optical Flow 5. Dual slow and fast frame rates 6. Capsule Networks Deep Video Architectures
  • 10. 10 Basic deep architectures for video: 1. Single frame models 2. Spatio-temporal convolutions 3. CNN + RNN 4. RGB + Optical Flow 5. Dual slow and fast frame rates 6. Capsule Networks Deep Video Architectures
  • 11. Single frame models 11 CNN CNN CNN... Combination method Combination is commonly implemented as a small NN on top of a temporal pooling operation (e.g. max, average). Problem: pooling is not aware of the temporal order! Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015 ŷ
  • 12. 12 #DeepVideo Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video classification with convolutional neural networks. CVPR 2014 Single frame models
  • 13. 13 Single frame models #DeepVideo Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video classification with convolutional neural networks. CVPR 2014
  • 14. 14 #DeepVideo Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. . Large-scale video classification with convolutional neural networks. CVPR 2014 Single frame models
  • 15. 15 Basic deep architectures for video: 1. Single frame models 2. Spatio-temporal convolutions 3. CNN + RNN 4. RGB + Optical Flow Two-stream CNN Deep Video Architectures
  • 16. 3D CNN (C3D) 16 We can add an extra dimension to standard 2D CNN filters: ● A grayscale video clip may be represented by a tensor of size H x W x L. ● A 3D conv filter for the 1st conv layer may be of size k x k x d #C3D Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks" ICCV 2015 time
  • 17. 17 The video needs to be split into chunks (also known as clips) with a number of frames that fits the receptive field of the C3D (eg. 16 frames). #C3D Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks" ICCV 2015 16-frame clip 16-frame clip 16-frame clip ... Average pooling 4096-dimvideodescriptor 4096-dimvideodescriptor L2 norm 3D CNN (C3D) + temporal pooling
  • 18. 18 3D CNN (C3D) #3D Resnet Hara, Kensho, Hirokatsu Kataoka, and Yutaka Satoh. "Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?." CVPR 2018. [code] Much larger datasets (eg. Kinetics) allow training much deeper C3D CNNs.
  • 19. 19 Limitation: ● How can we use pre-trained 2D networks to initialize C3D training ? 3D CNN
  • 20. 20 Inflated 3D weights (I3D) We can adapt C2D filters to C3D by replicating them along the temporal axis (and scaling its values by 1/d). inflation #I3D Carreira, J., & Zisserman, A. . Quo vadis, action recognition? A new model and the kinetics dataset. CVPR 2017. [code] k x k C2D filter I3D filter k x k x d Filter inflation allows reusing C2D pre-trained filters (eg. from ImageNet) to initialize C3D filters.
  • 21. 21 Residual Learning Residual learning: reformulate the layers as learning residual functions with respect to the identity F(x), instead of learning unreferenced functions H(x) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." CVPR 2016. [slides] Plain Net Residual Net
  • 22. 22 #Res3D Tran, Du, Jamie Ray, Zheng Shou, Shih-Fu Chang, and Manohar Paluri. "Convnet architecture search for spatiotemporal feature learning." arXiv preprint arXiv:1708.05038 (2017). [code] 3D CNN + Residual (Res3D) 2D ResNet Res3D
  • 23. 23 #Res3D Tran, Du, Jamie Ray, Zheng Shou, Shih-Fu Chang, and Manohar Paluri. "Convnet architecture search for spatiotemporal feature learning." arXiv preprint arXiv:1708.05038 (2017). [code] 3D CNN + Residual (Res3D)
  • 24. 24 #Res3D Tran, Du, Jamie Ray, Zheng Shou, Shih-Fu Chang, and Manohar Paluri. "Convnet architecture search for spatiotemporal feature learning." arXiv preprint arXiv:1708.05038 (2017). [code] 3D CNN + Residual (Res3D) Gain in performance comes with the cost of much more computation and larger memory footprint.
  • 25. 25 #MC Tran, Du, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. "A closer look at spatiotemporal convolutions for action recognition." CVPR 2018. [code] Mixed Convolutions (MC & rMC) Mixed Convolutions (MC) ● Early layers: 3D convolutions ● Top layers: 2D convolutions Reversed Mixed Convolutions (rMC) ● Early layers: 2D convolutions ● Top layers: 3D convolutions
  • 26. 26 #MC Tran, Du, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. "A closer look at spatiotemporal convolutions for action recognition." CVPR 2018. [code] MC ResNets: ● 3-4% gain in clip-level accuracy over 2D ResNets of comparable capacity ● match the performance of 3D ResNets, which have 3 times as many parameters. Mixed Convolutions (MC & rMC)
  • 27. 27 #R(2+1)D Tran, Du, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. "A closer look at spatiotemporal convolutions for action recognition." CVPR 2018. [code] C(2+1)D Splits the computation into a spatial 2D convolution followed by a temporal 1D convolution:. In the case of an input of a single feature channel: C3D C(2+1)D Number of 2D filters
  • 28. 28 #R(2+1)D Tran, Du, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. "A closer look at spatiotemporal convolutions for action recognition." CVPR 2018. [code] C(2+1D) + Residual
  • 29. 29 #R(2+1)D Tran, Du, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. "A closer look at spatiotemporal convolutions for action recognition." CVPR 2018. [code] 2+1D CNN + Residual
  • 30. 30 Separable 2D (group) convolutions Depth-wise (in channels) separable convolutions. #XCeption Chollet, François. "Xception: Deep learning with depthwise separable convolutions." CVPR 2017. Figure: Sik-Ho Tsang
  • 31. 31 Res3D + Separable (group) convolutions #CSN D. Tran, H. Wang, L. Torresani and M. Feiszli. Video Classification with Channel-Separated Convolutional Networks. ICCV 2019. [Caffe2] Channel interactions and spatiotemporal interactions are separated. 1 group 2 groups # groups = # input channels
  • 32. 32 Basic deep architectures for video: 1. Single frame models 2. Spatio-temporal convolutions 3. CNN + RNN 4. RGB + Optical Flow 5. Dual slow and fast frame rates 6. Capsule Networks Deep Video Architectures
  • 33. 33 The hidden layers and the output depend from previous states of the hidden layers Recurrent Neural Network (RNN)
  • 34. 2D CNN + RNN 34 CNN CNN CNN... Recurrent Neural Networks are well suited for processing sequences. Problem: RNNs are sequential and cannot be parallelized. RNN RNN RNN... Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code Videolectures on RNNs: DLSL 2017, “RNN (I)” “RNN (II)” DLAI 2018, “RNN”
  • 35. 35 #Res3D Tran, Du, Jamie Ray, Zheng Shou, Shih-Fu Chang, and Manohar Paluri. "Convnet architecture search for spatiotemporal feature learning." arXiv preprint arXiv:1708.05038 (2017). [code] 2D CNN + RNN vs Spatio-temporal Convs
  • 36. 36 Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. [code] 2D CNN + RNN
  • 37. 37 Limitations: ● How can we handle longer videos? ● How can we capture longer temporal dependencies? 3D CNN
  • 38. 38 3D CNN + RNN A. Montes, Salvador, A., Pascual-deLaPuente, S., and Giró-i-Nieto, X., “Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks”, NIPS Workshop 2016 (best poster award)
  • 39. 39 #SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018. 2D CNN + RNN: skip redundancy CNN CNN CNN... RNN RNN RNN... After processing a frame, let the RNN decide how many future frames can be skipped In skipped frames, simply copy the output and state from the previous time step There is no ground truth for which frames can be skipped. The RNN learns it by itself during training!
  • 40. 40 #SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018. 2D CNN + RNN Epochs for Skip LSTM (λ = 10-4 ) ~30% acc ~50% acc ~70% acc ~95% acc 11 epochs 51 epochs 101 epochs 400 epochs Used Unused
  • 41. 41 Used Unused 2D CNN + RNN: skip redundancy #SkipRNN Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, ICLR 2018.
  • 42. 42 Basic deep architectures for video: 1. Single frame models 2. CNN + RNN 3. 3D convolutions 4. RGB + Optical Flow 5. Dual slow and fast frame rates 6. Capsule Networks Deep Video Architectures
  • 43. 43 Ilg, Eddy, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. "Flownet 2.0: Evolution of optical flow estimation with deep networks." CVPR 2017. [code]
  • 44. Optical flow 44 Popular implementations for optical flow: #Coarse2Fine Brox, Thomas, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. "High accuracy optical flow estimation based on a theory for warping." ECCV 2004. ● PyFlow ● FlowNet2 ● Improved Dense Trajectories - IDT ● Lucas-Kanade (OpenCV) ● Farneback 2003 (OpenCV) ● Middlebury ● Horn and schunk ● Tikhonov regularized and vectorized ● DeepFlow (2013) ● (...)
  • 45. Optical flow 45 Deep Neural Networks have actually also been trained to predict optical flow: Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. ICCV 2015
  • 46. Two-streams 2D CNNs 46 Problem: Single frame models do not take into account motion in videos. Solution: extract optical flow for a stack of frames and use it as an input to a CNN. Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." NIPS 2014. Fusion
  • 47. Two-streams 2D CNNs 47Feichtenhofer, Christoph, Axel Pinz, and Andrew Zisserman. "Convolutional two-stream network fusion for video action recognition." CVPR 2016. [code] Fusion Problem: Single frame models do not take into account motion in videos. Solution: extract optical flow for a stack of frames and use it as an input to a CNN.
  • 48. Two-streams 2D CNNs + Residual 48 Feichtenhofer, Christoph, Axel Pinz, and Richard Wildes. "Spatiotemporal residual networks for video action recognition." NIPS 2016. [code] Problem: Single frame models do not take into account motion in videos. Solution: Extract optical flow for a stack of frames and feed it features in the appearance stream.
  • 49. Two-streams + Inflated 3D 49#I3D Carreira, J., & Zisserman, A. . Quo vadis, action recognition? A new model and the kinetics dataset. CVPR 2017. [code]
  • 50. Two-streams 3D CNNs 50#I3D Carreira, J., & Zisserman, A. . Quo vadis, action recognition? A new model and the kinetics dataset. CVPR 2017. [code]
  • 51. Two-streams 2D CNNs + RNN 51 Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015
  • 52. One stream RGB + Optical Flow 52 #2in1 Zhao, Jiaojiao, and Cees GM Snoek. "Dance with Flow: Two-in-One Stream Action Detection." CVPR 2019. Leveraging the motion condition to modulate RGB features improves detection accuracy.
  • 53. One stream RGB + Optical Flow 53 #2in1 Zhao, Jiaojiao, and Cees GM Snoek. "Dance with Flow: Two-in-One Stream Action Detection." CVPR 2019.
  • 54. One stream RGB + Optical Flow 54 #2in1 Zhao, Jiaojiao, and Cees GM Snoek. "Dance with Flow: Two-in-One Stream Action Detection." CVPR 2019. 2in1 has: ● half the computation and parameters of a two-stream equivalent ● better action detection accuracy (but worse for action classification).
  • 55. Two-streams (feature visualization) 55 Feichtenhofer, Christoph, Axel Pinz, Richard P. Wildes, and Andrew Zisserman. "What have we learned from deep representations for action recognition?." CVPR 2018. Feichtenhofer, Christoph, Axel Pinz, Richard P. Wildes, and Andrew Zisserman. "Deep Insights into Convolutional Networks for Video Recognition." IJCV 2019.
  • 56. Replacing C3D for C2D 56#S3D Xie, S., Sun, C., Huang, J., Tu, Z., & Murphy, K.. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. ECCV 2018
  • 57. 57 Basic deep architectures for video: 1. Single frame models 2. CNN + RNN 3. 3D convolutions 4. RGB + Optical Flow 5. Dual slow and fast frame rates 6. Capsule Networks Deep Video Architectures
  • 58. 58 Dual slow and fast frame rates #SlowFast Feichtenhofer, Christoph, Haoqi Fan, Jitendra Malik, and Kaiming He. "Slowfast networks for video recognition." ICCV 2019. [blog] [code] 2 fps 16 fps High capacity model Low capacity model
  • 59. 59 Dual slow and fast frame rates #SlowFast Feichtenhofer, Christoph, Haoqi Fan, Jitendra Malik, and Kaiming He. "Slowfast networks for video recognition." ICCV 2019. [blog] [code] Comparison with the state of the art on Kinetics-400
  • 60. 60 Dual slow and fast frame rates #AVSlowFast Xiao, Fanyi, Yong Jae Lee, Kristen Grauman, Jitendra Malik, and Christoph Feichtenhofer. "Audiovisual SlowFast Networks for Video Recognition." arXiv preprint arXiv:2001.08740 (2020).
  • 61. 61 Basic deep architectures for video: 1. Single frame models 2. CNN + RNN 3. 3D convolutions 4. RGB + Optical Flow Two-stream CNN 5. Dual slow and fast frame rates 6. Capsule Networks Deep Video Architectures
  • 62. 62 Capsule Networks #Videocapsulenet Duarte, Kevin, Yogesh Rawat, and Mubarak Shah. "Videocapsulenet: A simplified network for action detection." NeurIPS 2018. [code]
  • 63. 63 Basic deep architectures for video: 1. Single frame models 2. CNN + RNN 3. 3D convolutions 4. RGB + Optical Flow 5. Dual slow and fast frame rates 6. Capsule Networks Deep Video Architectures
  • 64. 64 Implementations PyTorch ● Facebook FAIR ○ SlowFast ○ SlowOnly ○ C2D ○ I3D ○ Non-local Network ● Torchvision ○ ResNet 3D 18 ○ ResNet MC 18 ○ ResNet(2+1)D
  • 65. 65 Learn more Rohit Ghosh, “Deep Learning for Videos: A 2018 Guide for Action recognition” (2018)
  • 66. 66 Related lectures [index] Object tracking [slides] [video] Video Object segmentation [slides] [video] Self-supervised Learning from Video Sequences [slides] [video] Self-supervised Audiovisual Learning [slides] [video]