SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Action Recognition
September 3, 2018
Katsunori Ohnishi
DeNA Co., Ltd.
1
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
n Action recognition
n
n
n
Deep
Deep
Temporal Aggregation
n Tips
n
n
2
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n ( )
Twitter: @ohnishi_ka
n
2014 4 -2017 9 : B4~M2.5 Computer Vision
• ( ) : http://katsunoriohnishi.github.io/
CVPR2016 (spotlight oral, acceptance rate=9.7%): egocentric vision (wrist-mounted camera)
ACMMM2016 (poster, acceptance rate=30%): action recognition ( state-of-the-art)
AAAI2018 (oral, acceptance rate=10.9%): video generation (FTGAN)
2017 10 - : DeNA AI
• DeNA
→ https://www.wantedly.com/projects/209980
3
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Action Recognition
n
Image classification
action recognition = human action recognition
• fine-grained egocentric
4
Fine-grained
egocentric
Dog-centric
Action recognition
RGBD
Evaluation of video activity localizations integrating quality and quantity measurements [C. Wolf+, CVIU14]
Recognizing Activities of Daily Living with a Wrist-mounted Camera [K. Ohnishi+, CVPR16]
A Database for Fine Grained Activity Detection of Cooking Activities [M. Rohrbach+, CVPR12]
First-Person Animal Activity Recognition from Egocentric Videos [Y. Iwashita+, ICPR14]
Recognizing Human Actions: A Local SVM Approach [C. Schuldt+, ICPR04]
HMDB: A Large Video Database for Human Motion Recognition [H. Kuehne+, ICCV11]
Ucf101: A dataset of 101 human actions classes from videos in the wild [K. Soomro+, arXiv2012]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
KTH, UCF101, HMDB51
• UCF101 101 13320 …
n
Activity-net, Kinetics, Youtube8M
n
AVA, Moments in times, SLAC
5
UCF101
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n YouTube-8M Video Understanding
Challenge
https://www.kaggle.com/c/youtube8m
CVPR17 ECCV18 workshop ,
Kaggle
frame-level
test
• kaggle , action recognition
n ActivityNet Challenge
http://activity-net.org/challenges/2018/
ActivityNet 3
• Temporal Proposal (T )
• Temporal localization (T )
• Video Captioning
• Kinetics: classification (human action)
• AVA: Spatio-temporal localization (XYT)
• Moments-in-time: classification (event)
6
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN
n
2000
SIFT
local descriptor→coding global feature→
n
STIP [I. Laptev, IJCV04]
Dense Trajectory [H. Wang+, ICCV11]
Improved Dense Trajectory [H. Wang+, ICCV13]
7
•
http://hirokatsukataoka.net/temp/presen/170121STAIRLab_slideshar
e.pdf
•
https://arxiv.org/pdf/1605.04988.pdf
On space-time interest points [I. Laptev, IJCV04]
Action Recognition by Dense Trajectories [H. Wang+, ICCV11]
Action Recognition with Improved Trajectories [H. Wang+, ICCV13]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN
n Improved Dense Trajectories (iDT) [H. Wang+, ICCV13]
Dense Trajectories [H. Wang+, ICCV11]
8
2
optical flow
foreground
optical flow
Improved dense trajectories (green)
(background dense trajectories (white))
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN
n
9
SIFT Fisher Vector
Fisher vector
http://www.isi.imi.i.u-tokyo.ac.jp/~harada/pdf/SSII_harada20120608.pdf
https://www.slideshare.net/takao-y/fisher-vector
…
input Local descriptor
iDT
Video descriptor
Fisher Vector
[F. Perronnin+, CVPR07]
Classifier
SVM
Fisher kernels on visual vocabularies for image categorization [F. Perronnin, CVPR07]
[F. Pedregosa+, JMLR11]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition
n
CNN
Two-stream
• Hand-crafted feature ( )
3D Convolution
• C3D
• C3D Two-stream
• 3D conv
Optical flow
10
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: CNN
n Spatio-temporal ConvNet [A. Karpathy+, CVPR 14]
CNN
AlexNet RGB ch → 10 frames ch (gray)
multi scale Fusion
Sports1M pre-training UCF101 65.4 (iDT 85.9%)
11
Large-scale video classification with convolutional neural network [A. Karpathy+, CVPR14]
• 10 frames conv1 ch
• RGB gray frame-by-frame
score ( )
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: Two-stream
n Two-stream [K. Simonyan+, NIPS15]
2D CNN* ,
• Spatial-stream: RGB (input: RGB)
• Temporal-stream: Optical flow (input: optical flow 10 frames)
• Frame-by-frame
Hand-crafted feature CNN
12
Two-stream convolutional networks for action recognition in videos [K. Simonyan+, NIPS15]
UCF101 HMDB51
iDT 85.9% 57.2%
Spatio-temporal ConvNet 65.4% -
RGB-stream 73.0% 40.5%
Flow-stream 83.7% 54.6%
Two-steam 88.0% 59.4%
• ( )
• 2DCNN
*imagenet pre-trained
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n C3D [D. Tran +, ICCV15]
16frame 3D convolution CNN
• XYT 3D convolution
UCF101 pre-training
ICCV15 arxiv 2 reject
13
Learning Spatiotemporal Features with 3D Convolutional Networks [D. Tran +, ICCV15]
UCF101 HMDB51
iDT 85.9% 57.2%
Two-steam 88.0% 59.4%
C3D (1net) 82.3% -
3D conv
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n P3D [Z. Qiu+, ICCV17]
C3D ,
3D conv → 2D conv (XY) + 1D conv (T)
pre-training
14
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17]
UCF101 HMDB51
iDT 85.9% 57.2%
Two-steam (Alexnet) 88.0% 59.4%
P3D (ResNet) 88.6% -
Spatial 2D conv
Temporal 1D conv
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n P3D [Z. Qiu+, ICCV17]
C3D ,
3D conv → 2D conv (XY) + 1D conv (T)
pre-training
15
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17]
UCF101 HMDB51
iDT 85.9% 57.2%
Two-steam (Alexnet) 88.0% 59.4%
P3D (ResNet) 88.6% -
Two-stream (ResNet152) 91.8%Spatial 2D conv
Temporal 1D conv
3D conv
again
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n C3D, P3D
3D conv
n
3D conv [K. Hara+, CVPR18]
16
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18]
2012 2011 2015 2017
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n C3D, P3D
3D conv
n
3D conv [K. Hara+, CVPR18]
17
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18]
2012 2011 2015 20172017
Kinetics!
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n Kinetics
human action dataset!
3D conv
• Pre-train UCF101
18
The Kinetics human action video dataset [W. Kay+, arXiv17]
• Youtube8M
•
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n I3D [J. Carreira +, ICCV17]
Kinetics dataset DeepMind
3D conv Inception
64 GPUs for training, 16 GPUs for predict
state-of-the-art
• RGB
• Two-stream optical flow
score
19
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17]
UCF101 HMDB51
RGB-I3D 95.6% 74.8%
Flow-I3D 96.7% 77.1%
Two-stream I3D 98.0% 80.7%
…
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n I3D [J. Carreira +, ICCV17]
Kinetics dataset DeepMind
3D conv Inception
64 GPUs for training, 16 GPUs for predict
state-of-the-art
• RGB
• Two-stream optical flow
score
20
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17]
UCF101 HMDB51
RGB-I3D 95.6% 74.8%
Flow-I3D 96.7% 77.1%
Two-stream I3D 98.0% 80.7%
…
?
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n I3D Two-stream
3D convolution
n ( )
3D conv XY T
• XY T
3D conv
21
time
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n 3D convolution [D.A. Huang+, CVPR18]
• 3D CNN
• →
•
• Two-stream I3D Optical flow 3D conv
22
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets [D.A. Huang+, CVPR18]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n 3D conv
CVPR18
CVPR/ICCV/ECCV
3D conv 3D
conv
• GPU
23
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: Optical flow
n Optical flow [L Sevilla-Lara+, CVPR18]
• Optical flow
• Optical flow (EPE) action recognition
• flow action recognition
•
Optical flow appearance
• Optical flow
24
On the Integration of Optical Flow and Action Recognition [L Sevilla-Lara+, CVPR18]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
25
AVA
XYZT bounding box
human action localization
Moments-in-time
3
Kinetics-600
Kinetics 400 600
[C. Gu+, CVPR18] [M. Monfort+, arXiv2018] [W. Kay+, arXiv2017]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n
2D conv frame-by-frame 3D conv
(100 frames, 232 frames, 50 frames)
26
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n
Score
→
LSTM
→
• FC
?
• fencing → fencing
→…
27
…
…
CNN
LSTM
FC
CNN
LSTM
FC
CNN
LSTM
FC
CVPR ACMMM AAAI
…
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
…
input Local descriptor
iDT
Video descriptor
Fisher Vector
[F. Perronnin+, CVPR07]
Classifier
SVM
[F. Pedregosa+, JMLR11]
Temporal Aggregation
n ,
→ …!
Fisher Vector
• CNN SIFT GMM
• FV VLAD [H. Jegou+, CVPR10]
28
Aggregating local descriptors into a compact image representation [H. Jegou+, CVPR10]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n LCD [Z. Xu+, CVPR15]
VGG16 pool5 XY 512dim feature
• 224x224 feature 7x7=49
• VLAD global feature
29
A discriminative CNN video representation for event detection [Z. Xu+, CVPR15]
…
input
CNN
Pool5
(e.g. 2x2x512)
Local descriptors
VLAD
SVM
global feature
CNN
CNN
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n ActionVLAD [R. Girdhar+, CVPR17]
NetVLAD [R Arandjelović+, CVPR16]
• NetVLAD VLAD NN Cluster assign softmax
assign
• VLAD LCD
VLAD
• End2end CNN !
30
ActionVLAD: Learning spatio-temporal aggregation for action classification [R. Girdhar+, CVPR17]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n TLE [A. Diba+, CVPR17]
VLAD Compact Bilinear Pooling [Y. Gao+, CVPR16]
Temporal Aggregation
VLAD
• SVM VLAD NN
31
Deep Temporal Linear Encoding Networks [A. Diba+, CVPR17]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Tips
n
Two-stream (ResNet) 2D conv Optical flow
n Single model State-of-the-art
I3D + TLE BA
64GPU
n
Two-stream optical flow GPU
• optical flow stream
• RGB-stream
Optical flow
32
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Tips
n
CNN TLE coding
• TLE ActionVLAD
iDT
• CNN
• FisherVector iDT
Tips: PCA (dim=64). K=256. FV power norm
• CPU
33
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n
Score
→
LSTM
→
• FC
?
• fencing → fencing
→…
34
…
…
CNN
LSTM
FC
CNN
LSTM
FC
CNN
LSTM
FC
CVPR ACMMM AAAI
…
input
↓
Two-stream
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
LSTM
3D conv
Optical flow
•
[L Sevilla-Lara+, CVPR18]
35
…
…
CNN
LSTM
FC
CNN
LSTM
FC
CNN
LSTM
FC
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
2D conv + LSTM 3D conv 3D conv
Two-stream
Optical flow
MoCoGAN
[S. Tulyakov+, CVPR18]
VGAN
[C. Vondrick+, NIPS16]
TGAN
[M. Saito+, ICCV17]
FTGAN
[K. Ohnishi+, AAAI18]
LRCN
[J. Donahue+, CVPR15]
C3D
[D. Tran+, ICCV15]
P3D
[Z. Qiu+, ICCV17]
Two-stream [K. Simonyan+, NIPS15]
I3D [J. Carreira +, ICCV17]
( )VGAN
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
2D conv + LSTM 3D conv 3D conv
Two-stream
Optical flow
MoCoGAN
[S. Tulyakov+, CVPR18]
VGAN
[C. Vondrick+, NIPS16]
TGAN
[M. Saito+, ICCV17]
FTGAN
[K. Ohnishi+, AAAI18]
LRCN
[J. Donahue+, CVPR15]
C3D
[D. Tran+, ICCV15]
P3D
[Z. Qiu+, ICCV17]
Two-stream [K. Simonyan+, NIPS15]
I3D [J. Carreira +, ICCV17]
( )
!
VGAN
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n !
Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture
K. Ohnishi+, AAAI 2018 (oral presentation)
https://arxiv.org/abs/1711.09618
38
Optical flow
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
Action classification
• Temporal action localization Spatio-temporal localization
3D conv
Augmentation
n Pose
Pose
• pose
• data distillation
n Tips
&optical flow
Kinetics Youtube
39
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
XY XYT O(n2)→ O(n3)
• !
n
n
n
40

Weitere ähnliche Inhalte

Was ist angesagt?

動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセット動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセットToru Tamaki
 
【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者cvpaper. challenge
 
3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向Kensho Hara
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision TransformerYusuke Uchida
 
動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)cvpaper. challenge
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? Deep Learning JP
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)Deep Learning JP
 
backbone としての timm 入門
backbone としての timm 入門backbone としての timm 入門
backbone としての timm 入門Takuji Tahara
 
[DL輪読会]An Image is Worth 16x16 Words: Transformers for Image Recognition at S...
[DL輪読会]An Image is Worth 16x16 Words: Transformers for Image Recognition at S...[DL輪読会]An Image is Worth 16x16 Words: Transformers for Image Recognition at S...
[DL輪読会]An Image is Worth 16x16 Words: Transformers for Image Recognition at S...Deep Learning JP
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP
 
【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-
【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-
【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-Hirokatsu Kataoka
 
Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Plot Hong
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!TransformerArithmer Inc.
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイcvpaper. challenge
 
マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向Koichiro Mori
 
【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)
【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)
【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)Deep Learning JP
 
【チュートリアル】コンピュータビジョンによる動画認識
【チュートリアル】コンピュータビジョンによる動画認識【チュートリアル】コンピュータビジョンによる動画認識
【チュートリアル】コンピュータビジョンによる動画認識Hirokatsu Kataoka
 
【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Modelscvpaper. challenge
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門tmtm otm
 

Was ist angesagt? (20)

動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセット動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセット
 
【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者
 
3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向
 
Lucas kanade法について
Lucas kanade法についてLucas kanade法について
Lucas kanade法について
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer
 
動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
 
backbone としての timm 入門
backbone としての timm 入門backbone としての timm 入門
backbone としての timm 入門
 
[DL輪読会]An Image is Worth 16x16 Words: Transformers for Image Recognition at S...
[DL輪読会]An Image is Worth 16x16 Words: Transformers for Image Recognition at S...[DL輪読会]An Image is Worth 16x16 Words: Transformers for Image Recognition at S...
[DL輪読会]An Image is Worth 16x16 Words: Transformers for Image Recognition at S...
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-
【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-
【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-
 
Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!Transformer
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイ
 
マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向
 
【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)
【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)
【DL輪読会】言語以外でのTransformerのまとめ (ViT, Perceiver, Frozen Pretrained Transformer etc)
 
【チュートリアル】コンピュータビジョンによる動画認識
【チュートリアル】コンピュータビジョンによる動画認識【チュートリアル】コンピュータビジョンによる動画認識
【チュートリアル】コンピュータビジョンによる動画認識
 
【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門
 

Ähnlich wie Action Recognitionの歴史と最新動向

動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understandingToru Tamaki
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]Dongmin Choi
 
How Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather EventsHow Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather Eventsinside-BigData.com
 
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen..."Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...Edge AI and Vision Alliance
 
Recent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionRecent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionHiroto Honda
 
Video complexity analyzer (VCA) for streaming applications
 Video complexity analyzer (VCA) for streaming applications Video complexity analyzer (VCA) for streaming applications
Video complexity analyzer (VCA) for streaming applicationsAlpen-Adria-Universität
 
Navigation-aware adaptive streaming strategies for omnidirectional video
Navigation-aware adaptive streaming strategies for omnidirectional videoNavigation-aware adaptive streaming strategies for omnidirectional video
Navigation-aware adaptive streaming strategies for omnidirectional videoSilvia Rossi
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Hiroto Honda
 
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Michael Hewitt, GISP
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Provectus
 
Presentation NBMP and PCC
Presentation NBMP and PCCPresentation NBMP and PCC
Presentation NBMP and PCCRufael Mekuria
 
GRT Imaging for Seismic AVO/AVA Inversion
GRT Imaging for Seismic AVO/AVA InversionGRT Imaging for Seismic AVO/AVA Inversion
GRT Imaging for Seismic AVO/AVA InversionMarie Spence
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Universitat Politècnica de Catalunya
 
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...Edge AI and Vision Alliance
 
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision..."Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...Edge AI and Vision Alliance
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureRouyun Pan
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...Kitsukawa Yuki
 
GTC Europe 2017 Keynote
GTC Europe 2017 KeynoteGTC Europe 2017 Keynote
GTC Europe 2017 KeynoteNVIDIA
 

Ähnlich wie Action Recognitionの歴史と最新動向 (20)

動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
 
How Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather EventsHow Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather Events
 
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen..."Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
 
Recent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionRecent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-Resolution
 
Video complexity analyzer (VCA) for streaming applications
 Video complexity analyzer (VCA) for streaming applications Video complexity analyzer (VCA) for streaming applications
Video complexity analyzer (VCA) for streaming applications
 
Navigation-aware adaptive streaming strategies for omnidirectional video
Navigation-aware adaptive streaming strategies for omnidirectional videoNavigation-aware adaptive streaming strategies for omnidirectional video
Navigation-aware adaptive streaming strategies for omnidirectional video
 
Neural Architectures for Video Encoding
Neural Architectures for Video EncodingNeural Architectures for Video Encoding
Neural Architectures for Video Encoding
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩
 
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
 
Presentation NBMP and PCC
Presentation NBMP and PCCPresentation NBMP and PCC
Presentation NBMP and PCC
 
GRT Imaging for Seismic AVO/AVA Inversion
GRT Imaging for Seismic AVO/AVA InversionGRT Imaging for Seismic AVO/AVA Inversion
GRT Imaging for Seismic AVO/AVA Inversion
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
 
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision..."Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
Session6
Session6Session6
Session6
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
 
GTC Europe 2017 Keynote
GTC Europe 2017 KeynoteGTC Europe 2017 Keynote
GTC Europe 2017 Keynote
 

Kürzlich hochgeladen

AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Juan Carlos Gonzalez
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimizationarrow10202532yuvraj
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 

Kürzlich hochgeladen (20)

AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 

Action Recognitionの歴史と最新動向

  • 1. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Action Recognition September 3, 2018 Katsunori Ohnishi DeNA Co., Ltd. 1
  • 2. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n n Action recognition n n n Deep Deep Temporal Aggregation n Tips n n 2
  • 3. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n ( ) Twitter: @ohnishi_ka n 2014 4 -2017 9 : B4~M2.5 Computer Vision • ( ) : http://katsunoriohnishi.github.io/ CVPR2016 (spotlight oral, acceptance rate=9.7%): egocentric vision (wrist-mounted camera) ACMMM2016 (poster, acceptance rate=30%): action recognition ( state-of-the-art) AAAI2018 (oral, acceptance rate=10.9%): video generation (FTGAN) 2017 10 - : DeNA AI • DeNA → https://www.wantedly.com/projects/209980 3
  • 4. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Action Recognition n Image classification action recognition = human action recognition • fine-grained egocentric 4 Fine-grained egocentric Dog-centric Action recognition RGBD Evaluation of video activity localizations integrating quality and quantity measurements [C. Wolf+, CVIU14] Recognizing Activities of Daily Living with a Wrist-mounted Camera [K. Ohnishi+, CVPR16] A Database for Fine Grained Activity Detection of Cooking Activities [M. Rohrbach+, CVPR12] First-Person Animal Activity Recognition from Egocentric Videos [Y. Iwashita+, ICPR14] Recognizing Human Actions: A Local SVM Approach [C. Schuldt+, ICPR04] HMDB: A Large Video Database for Human Motion Recognition [H. Kuehne+, ICCV11] Ucf101: A dataset of 101 human actions classes from videos in the wild [K. Soomro+, arXiv2012]
  • 5. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n KTH, UCF101, HMDB51 • UCF101 101 13320 … n Activity-net, Kinetics, Youtube8M n AVA, Moments in times, SLAC 5 UCF101
  • 6. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n YouTube-8M Video Understanding Challenge https://www.kaggle.com/c/youtube8m CVPR17 ECCV18 workshop , Kaggle frame-level test • kaggle , action recognition n ActivityNet Challenge http://activity-net.org/challenges/2018/ ActivityNet 3 • Temporal Proposal (T ) • Temporal localization (T ) • Video Captioning • Kinetics: classification (human action) • AVA: Spatio-temporal localization (XYT) • Moments-in-time: classification (event) 6
  • 7. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN n 2000 SIFT local descriptor→coding global feature→ n STIP [I. Laptev, IJCV04] Dense Trajectory [H. Wang+, ICCV11] Improved Dense Trajectory [H. Wang+, ICCV13] 7 • http://hirokatsukataoka.net/temp/presen/170121STAIRLab_slideshar e.pdf • https://arxiv.org/pdf/1605.04988.pdf On space-time interest points [I. Laptev, IJCV04] Action Recognition by Dense Trajectories [H. Wang+, ICCV11] Action Recognition with Improved Trajectories [H. Wang+, ICCV13]
  • 8. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN n Improved Dense Trajectories (iDT) [H. Wang+, ICCV13] Dense Trajectories [H. Wang+, ICCV11] 8 2 optical flow foreground optical flow Improved dense trajectories (green) (background dense trajectories (white))
  • 9. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN n 9 SIFT Fisher Vector Fisher vector http://www.isi.imi.i.u-tokyo.ac.jp/~harada/pdf/SSII_harada20120608.pdf https://www.slideshare.net/takao-y/fisher-vector … input Local descriptor iDT Video descriptor Fisher Vector [F. Perronnin+, CVPR07] Classifier SVM Fisher kernels on visual vocabularies for image categorization [F. Perronnin, CVPR07] [F. Pedregosa+, JMLR11]
  • 10. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition n CNN Two-stream • Hand-crafted feature ( ) 3D Convolution • C3D • C3D Two-stream • 3D conv Optical flow 10
  • 11. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: CNN n Spatio-temporal ConvNet [A. Karpathy+, CVPR 14] CNN AlexNet RGB ch → 10 frames ch (gray) multi scale Fusion Sports1M pre-training UCF101 65.4 (iDT 85.9%) 11 Large-scale video classification with convolutional neural network [A. Karpathy+, CVPR14] • 10 frames conv1 ch • RGB gray frame-by-frame score ( )
  • 12. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: Two-stream n Two-stream [K. Simonyan+, NIPS15] 2D CNN* , • Spatial-stream: RGB (input: RGB) • Temporal-stream: Optical flow (input: optical flow 10 frames) • Frame-by-frame Hand-crafted feature CNN 12 Two-stream convolutional networks for action recognition in videos [K. Simonyan+, NIPS15] UCF101 HMDB51 iDT 85.9% 57.2% Spatio-temporal ConvNet 65.4% - RGB-stream 73.0% 40.5% Flow-stream 83.7% 54.6% Two-steam 88.0% 59.4% • ( ) • 2DCNN *imagenet pre-trained
  • 13. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n C3D [D. Tran +, ICCV15] 16frame 3D convolution CNN • XYT 3D convolution UCF101 pre-training ICCV15 arxiv 2 reject 13 Learning Spatiotemporal Features with 3D Convolutional Networks [D. Tran +, ICCV15] UCF101 HMDB51 iDT 85.9% 57.2% Two-steam 88.0% 59.4% C3D (1net) 82.3% - 3D conv
  • 14. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n P3D [Z. Qiu+, ICCV17] C3D , 3D conv → 2D conv (XY) + 1D conv (T) pre-training 14 Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17] UCF101 HMDB51 iDT 85.9% 57.2% Two-steam (Alexnet) 88.0% 59.4% P3D (ResNet) 88.6% - Spatial 2D conv Temporal 1D conv
  • 15. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n P3D [Z. Qiu+, ICCV17] C3D , 3D conv → 2D conv (XY) + 1D conv (T) pre-training 15 Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17] UCF101 HMDB51 iDT 85.9% 57.2% Two-steam (Alexnet) 88.0% 59.4% P3D (ResNet) 88.6% - Two-stream (ResNet152) 91.8%Spatial 2D conv Temporal 1D conv 3D conv again
  • 16. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n C3D, P3D 3D conv n 3D conv [K. Hara+, CVPR18] 16 Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18] 2012 2011 2015 2017
  • 17. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n C3D, P3D 3D conv n 3D conv [K. Hara+, CVPR18] 17 Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18] 2012 2011 2015 20172017 Kinetics!
  • 18. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n Kinetics human action dataset! 3D conv • Pre-train UCF101 18 The Kinetics human action video dataset [W. Kay+, arXiv17] • Youtube8M •
  • 19. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n I3D [J. Carreira +, ICCV17] Kinetics dataset DeepMind 3D conv Inception 64 GPUs for training, 16 GPUs for predict state-of-the-art • RGB • Two-stream optical flow score 19 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17] UCF101 HMDB51 RGB-I3D 95.6% 74.8% Flow-I3D 96.7% 77.1% Two-stream I3D 98.0% 80.7% …
  • 20. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n I3D [J. Carreira +, ICCV17] Kinetics dataset DeepMind 3D conv Inception 64 GPUs for training, 16 GPUs for predict state-of-the-art • RGB • Two-stream optical flow score 20 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17] UCF101 HMDB51 RGB-I3D 95.6% 74.8% Flow-I3D 96.7% 77.1% Two-stream I3D 98.0% 80.7% … ?
  • 21. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n I3D Two-stream 3D convolution n ( ) 3D conv XY T • XY T 3D conv 21 time
  • 22. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n 3D convolution [D.A. Huang+, CVPR18] • 3D CNN • → • • Two-stream I3D Optical flow 3D conv 22 What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets [D.A. Huang+, CVPR18]
  • 23. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n 3D conv CVPR18 CVPR/ICCV/ECCV 3D conv 3D conv • GPU 23
  • 24. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: Optical flow n Optical flow [L Sevilla-Lara+, CVPR18] • Optical flow • Optical flow (EPE) action recognition • flow action recognition • Optical flow appearance • Optical flow 24 On the Integration of Optical Flow and Action Recognition [L Sevilla-Lara+, CVPR18]
  • 25. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. 25 AVA XYZT bounding box human action localization Moments-in-time 3 Kinetics-600 Kinetics 400 600 [C. Gu+, CVPR18] [M. Monfort+, arXiv2018] [W. Kay+, arXiv2017]
  • 26. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n 2D conv frame-by-frame 3D conv (100 frames, 232 frames, 50 frames) 26
  • 27. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n Score → LSTM → • FC ? • fencing → fencing →… 27 … … CNN LSTM FC CNN LSTM FC CNN LSTM FC CVPR ACMMM AAAI …
  • 28. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. … input Local descriptor iDT Video descriptor Fisher Vector [F. Perronnin+, CVPR07] Classifier SVM [F. Pedregosa+, JMLR11] Temporal Aggregation n , → …! Fisher Vector • CNN SIFT GMM • FV VLAD [H. Jegou+, CVPR10] 28 Aggregating local descriptors into a compact image representation [H. Jegou+, CVPR10]
  • 29. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n LCD [Z. Xu+, CVPR15] VGG16 pool5 XY 512dim feature • 224x224 feature 7x7=49 • VLAD global feature 29 A discriminative CNN video representation for event detection [Z. Xu+, CVPR15] … input CNN Pool5 (e.g. 2x2x512) Local descriptors VLAD SVM global feature CNN CNN
  • 30. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n ActionVLAD [R. Girdhar+, CVPR17] NetVLAD [R Arandjelović+, CVPR16] • NetVLAD VLAD NN Cluster assign softmax assign • VLAD LCD VLAD • End2end CNN ! 30 ActionVLAD: Learning spatio-temporal aggregation for action classification [R. Girdhar+, CVPR17]
  • 31. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n TLE [A. Diba+, CVPR17] VLAD Compact Bilinear Pooling [Y. Gao+, CVPR16] Temporal Aggregation VLAD • SVM VLAD NN 31 Deep Temporal Linear Encoding Networks [A. Diba+, CVPR17]
  • 32. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Tips n Two-stream (ResNet) 2D conv Optical flow n Single model State-of-the-art I3D + TLE BA 64GPU n Two-stream optical flow GPU • optical flow stream • RGB-stream Optical flow 32
  • 33. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Tips n CNN TLE coding • TLE ActionVLAD iDT • CNN • FisherVector iDT Tips: PCA (dim=64). K=256. FV power norm • CPU 33
  • 34. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n Score → LSTM → • FC ? • fencing → fencing →… 34 … … CNN LSTM FC CNN LSTM FC CNN LSTM FC CVPR ACMMM AAAI … input ↓ Two-stream
  • 35. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n LSTM 3D conv Optical flow • [L Sevilla-Lara+, CVPR18] 35 … … CNN LSTM FC CNN LSTM FC CNN LSTM FC
  • 36. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. 2D conv + LSTM 3D conv 3D conv Two-stream Optical flow MoCoGAN [S. Tulyakov+, CVPR18] VGAN [C. Vondrick+, NIPS16] TGAN [M. Saito+, ICCV17] FTGAN [K. Ohnishi+, AAAI18] LRCN [J. Donahue+, CVPR15] C3D [D. Tran+, ICCV15] P3D [Z. Qiu+, ICCV17] Two-stream [K. Simonyan+, NIPS15] I3D [J. Carreira +, ICCV17] ( )VGAN
  • 37. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. 2D conv + LSTM 3D conv 3D conv Two-stream Optical flow MoCoGAN [S. Tulyakov+, CVPR18] VGAN [C. Vondrick+, NIPS16] TGAN [M. Saito+, ICCV17] FTGAN [K. Ohnishi+, AAAI18] LRCN [J. Donahue+, CVPR15] C3D [D. Tran+, ICCV15] P3D [Z. Qiu+, ICCV17] Two-stream [K. Simonyan+, NIPS15] I3D [J. Carreira +, ICCV17] ( ) ! VGAN
  • 38. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n ! Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture K. Ohnishi+, AAAI 2018 (oral presentation) https://arxiv.org/abs/1711.09618 38 Optical flow
  • 39. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n Action classification • Temporal action localization Spatio-temporal localization 3D conv Augmentation n Pose Pose • pose • data distillation n Tips &optical flow Kinetics Youtube 39
  • 40. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n XY XYT O(n2)→ O(n3) • ! n n n 40