SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Downloaden Sie, um offline zu lesen
Towards Scaling Video
Understanding
Serena Yeung
YouTube TV
GoPro Smart spaces
State-of-the-art in video understanding
State-of-the-art in video understanding
Classification
Abu-El-Haija et al. 2016
4,800 categories
15.2 Top5 error
State-of-the-art in video understanding
Classification Detection
Idrees et al. 2017, Sigurdsson et al. 2016Abu-El-Haija et al. 2016
4,800 categories
15.2 Top5 error
Tens of categories
~10-20 mAP at 0.5 overlap
State-of-the-art in video understanding
Classification Detection
Abu-El-Haija et al. 2016
Captioning
4,800 categories
15.2 Top5 error
Yu et al. 2016
Just getting started:
Short clips, niche domains
Idrees et al. 2017, Sigurdsson et al. 2016
Tens of categories
~10-20 mAP at 0.5 overlap
Comparing video with image understanding
Comparing video with image understanding
Classification
4,800 categories
15.2% Top5 error
Videos
Images 1,000 categories*
3.1% Top5 error
*Transfer learning
widespread
Krizhevsky 2012, Xie 2016
Comparing video with image understanding
He 2017
Classification Detection
4,800 categories
15.2% Top5 error
Tens of categories
~10-20 mAP at 0.5 overlap
Videos
Images 1,000 categories*
3.1% Top5 error
*Transfer learning
widespread
Hundreds of categories*
~60 mAP at 0.5 overlap
Pixel-level segmentation
*Transfer learning
widespread
Krizhevsky 2012, Xie 2016
Comparing video with image understanding
He 2017 Johnson 2016, Krause 2017
Classification Detection Captioning
4,800 categories
15.2% Top5 error
Just getting started:
Short clips, niche
domains
Videos
Images 1,000 categories*
3.1% Top5 error
*Transfer learning
widespread
Hundreds of categories*
~60 mAP at 0.5 overlap
Pixel-level segmentation
*Transfer learning
widespread
Dense captioning
Coherent paragraphs
Krizhevsky 2012, Xie 2016
Tens of categories
~10-20 mAP at 0.5 overlap
Comparing video with image understanding
He 2017 Johnson 2016, Krause 2017
Classification Detection Captioning
4,800 categories
15.2 Top5 error
Just getting started:
Short clips, niche
domains
Videos
Beyond
Images 1,000 categories*
3.1% Top5 error
*Transfer learning
widespread
Hundreds of categories*
~60 mAP at 0.5 overlap
Pixel-level segmentation
*Transfer learning
widespread
Dense captioning
Coherent paragraphs
Significant work on
question-answering
—
Krizhevsky 2012, Xie 2016
Yang 2016
Tens of categories
~10-20 mAP at 0.5 overlap
The challenge of scale
Training labels Inference
Models
Video processing is
computationally expensive
Video annotation is
labor-intensive
Temporal dimension adds
complexity
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Input Output
t = 0 t = T
Running
Task: Temporal action detection
Talking
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Efficient video processing
t = 0 t = T
Output
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Frame model
Input: a frame
Our model for efficient action detection
t = 0 t = T
[ ]
Output
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Output:
Detection instance [start, end]
Next frame to glimpse
Frame model
Input: a frame
Our model for efficient action detection
t = 0 t = T
Recurrent neural network
(time information)
[ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Our model for efficient action detection
t = 0 t = T
Recurrent neural network
(time information)
[ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Our model for efficient action detection
t = 0 t = T
Recurrent neural network
(time information)
[ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Our model for efficient action detection
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
[ ] [ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
[ ] [ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
[ ] [ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
[ ] [ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Output
[ ]
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
[ ] [ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Output
[ ]
…
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
Output
Convolutional neural network
(frame information)
Output
[ ]
…
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Optional output:
Detection instance [start, end]
Output:
Next frame to glimpse
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Our model for efficient action detection
• Train differentiable outputs (detection output class and bounds) using
standard backpropagation
• Train non-differentiable outputs (where to look next, when to emit a
prediction) using reinforcement learning (REINFORCE algorithm)
• Achieves detection performance on par with dense sliding window-
based approaches, while observing only 2% of frames
Learned policy in action
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Learned policy in action
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Dense action labeling
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiTHUMOS
• Extends the THUMOS’14 action detection dataset with dense, multilevel,
frame-level action annotations for 30 hours across 400 videos
THUMOS MultiTHUMOS
Annotations 6,365 38,690
Classes 20 65
Density (labels / frame) 0.3 1.5
Classes per video 1.1 10.5
Max actions per frame 2 9
Max actions per video 3 25
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
Modeling dense, multilabel actions
• Need to reason about multiple potential actions simultaneously
• High degree of temporal dependency
• In standard recurrent models for action recognition, all state is in hidden layer
representation
• At each time step, makes prediction of current frame based on the current
frame and previous hidden representation
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiLSTM
• Extension of LSTM that expands the temporal receptive field of input
and output connections
• Key idea: providing the model with more freedom in both reading
input and writing output reduces the burden placed on the hidden
layer representation
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiLSTM
Input video frames
Frame class predictions
t
Standard LSTM
……
Donahue 2014
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiLSTM
Frame class predictions
t
……
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
Standard LSTM: Single input, single output
Input video frames
MultiLSTM
Frame class predictions
t
……
Frame class predictions
t
MultiLSTM: Multiple inputs, multiple outputs
……
Standard LSTM: Single input, single output
Input video frames Input video frames
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiLSTM
Multiple Inputs (soft attention)
Multiple Outputs (weighted average)
Multilabel Loss
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiLSTM
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
Retrieving sequential actions
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
Retrieving co-occurring actions
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-
Fei. Learning to learn from noisy web videos.
CVPR 2017.
Labeling videos is expensive
• Takes significantly longer to
label a video than an image
• If spatial or temporal bounds
desired, even worse
• How can we practically learn
about new concepts in video?
Web queries are a source of noisy video labels
Image search is much cleaner!
Can we effectively learn from the noisy web queries?
• Our approach: learn how to selective positive training examples
from noisy queries in order to train classifiers for new classes
• Use a reinforcement learning-based formulation to learn a data
labeling policy that achieves strong performance on a small,
manually-labeled dataset of classes
• Then use this policy to automatically label noisy web data for new
classes
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Balancing diversity vs. semantic drift
• Want diverse training examples to improve classifier
• But too much diversity can also lead to semantic drift
• Our approach: balance diversity and drift by training labeling
policies using an annotated reward set which the policy must
successfully classify
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Overview of approach
Boomerang …
Boomerang
on a beach
Boomerang
music video
Classifier
Candidate web queries
(YouTube autocomplete)
Agent
Label new positives
+
Boomerang
on a beach
Current
positive set
Update classifier
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Update
state
Overview of approach
Boomerang …
Boomerang
on a beach
Boomerang
music video
Classifier
Candidate web queries
(YouTube autocomplete)
Agent
Label new positives
+
Boomerang
on a beach
Current
positive set
Update classifier
Fixed
negative set
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Update
state
Overview of approach
Boomerang …
Boomerang
on a beach
Boomerang
music video
Classifier
Candidate web queries
(YouTube autocomplete)
Agent
Label new positives
+
Boomerang
on a beach
Current
positive set
Update classifier
Update
state
Fixed
negative set
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Training reward
Eval on
reward set
Sports1M
Greedy classifier Ours
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Sports1M
Greedy classifier Ours
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Novel classes
Greedy classifier Ours
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-
Fei. Learning to learn from noisy web videos.
CVPR 2017.
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-
Fei. Learning to learn from noisy web videos.
CVPR 2017.
Learning to learn
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-
Fei. Learning to learn from noisy web videos.
CVPR 2017.
Learning to learn
Unsupervised learning
Towards Knowledge
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Videos
Knowledge of the
dynamic visual world
Collaborators
Olga Russakovsky Mykhaylo Andriluka Ning Jin Vignesh Ramanathan Liyue Shen
Greg Mori Fei-Fei Li
Thank You

Weitere ähnliche Inhalte

Andere mochten auch

Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...MLconf
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...MLconf
 
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017MLconf
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017MLconf
 
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017MLconf
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...MLconf
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017MLconf
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016MLconf
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017MLconf
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017MLconf
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017MLconf
 

Andere mochten auch (19)

Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
 
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
 
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
 

Ähnlich wie Serena Yeung, PHD, Stanford, at MLconf Seattle 2017

論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future DirectionsToru Tamaki
 
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...sipij
 
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...sipij
 
Real-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big DataReal-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big DataIRJET Journal
 
Video search by deep-learning
Video search by deep-learningVideo search by deep-learning
Video search by deep-learningvoginip
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformFadwa Fouad
 
Recognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenesRecognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenesIJCSEA Journal
 
A Smart Target Detection System using Fuzzy Logic and Background Subtraction
A Smart Target Detection System using Fuzzy Logic and Background SubtractionA Smart Target Detection System using Fuzzy Logic and Background Subtraction
A Smart Target Detection System using Fuzzy Logic and Background SubtractionIRJET Journal
 
Generating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventGenerating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventYara Ali
 
Video Object Extraction Using Feature Matching Based on Nonlocal Matting
Video Object Extraction Using Feature Matching Based on Nonlocal MattingVideo Object Extraction Using Feature Matching Based on Nonlocal Matting
Video Object Extraction Using Feature Matching Based on Nonlocal MattingMeidya Koeshardianto
 
Objective Evaluation of Video Quality
Objective Evaluation of Video QualityObjective Evaluation of Video Quality
Objective Evaluation of Video QualityAnton Venema
 
Measuring Heart Rate from Video using EVM
Measuring Heart Rate from Video using EVMMeasuring Heart Rate from Video using EVM
Measuring Heart Rate from Video using EVMIRJET Journal
 
IRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET Journal
 
Video Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAPVideo Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAPinventionjournals
 
Tablet tools and micro tasks
Tablet tools and micro tasksTablet tools and micro tasks
Tablet tools and micro tasksmhilde
 

Ähnlich wie Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 (20)

Video + Language 2019
Video + Language 2019Video + Language 2019
Video + Language 2019
 
Video + Language
Video + LanguageVideo + Language
Video + Language
 
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
 
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
 
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
 
Real-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big DataReal-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big Data
 
presentation
presentationpresentation
presentation
 
Video+Language: From Classification to Description
Video+Language: From Classification to DescriptionVideo+Language: From Classification to Description
Video+Language: From Classification to Description
 
Video search by deep-learning
Video search by deep-learningVideo search by deep-learning
Video search by deep-learning
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
 
Recognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenesRecognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenes
 
A Smart Target Detection System using Fuzzy Logic and Background Subtraction
A Smart Target Detection System using Fuzzy Logic and Background SubtractionA Smart Target Detection System using Fuzzy Logic and Background Subtraction
A Smart Target Detection System using Fuzzy Logic and Background Subtraction
 
Generating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventGenerating a time shrunk lecture video by event
Generating a time shrunk lecture video by event
 
Video Object Extraction Using Feature Matching Based on Nonlocal Matting
Video Object Extraction Using Feature Matching Based on Nonlocal MattingVideo Object Extraction Using Feature Matching Based on Nonlocal Matting
Video Object Extraction Using Feature Matching Based on Nonlocal Matting
 
Objective Evaluation of Video Quality
Objective Evaluation of Video QualityObjective Evaluation of Video Quality
Objective Evaluation of Video Quality
 
Measuring Heart Rate from Video using EVM
Measuring Heart Rate from Video using EVMMeasuring Heart Rate from Video using EVM
Measuring Heart Rate from Video using EVM
 
IRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV Camera
 
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
 
Video Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAPVideo Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAP
 
Tablet tools and micro tasks
Tablet tools and micro tasksTablet tools and micro tasks
Tablet tools and micro tasks
 

Mehr von MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Serena Yeung, PHD, Stanford, at MLconf Seattle 2017

  • 2.
  • 5. State-of-the-art in video understanding Classification Abu-El-Haija et al. 2016 4,800 categories 15.2 Top5 error
  • 6. State-of-the-art in video understanding Classification Detection Idrees et al. 2017, Sigurdsson et al. 2016Abu-El-Haija et al. 2016 4,800 categories 15.2 Top5 error Tens of categories ~10-20 mAP at 0.5 overlap
  • 7. State-of-the-art in video understanding Classification Detection Abu-El-Haija et al. 2016 Captioning 4,800 categories 15.2 Top5 error Yu et al. 2016 Just getting started: Short clips, niche domains Idrees et al. 2017, Sigurdsson et al. 2016 Tens of categories ~10-20 mAP at 0.5 overlap
  • 8. Comparing video with image understanding
  • 9. Comparing video with image understanding Classification 4,800 categories 15.2% Top5 error Videos Images 1,000 categories* 3.1% Top5 error *Transfer learning widespread Krizhevsky 2012, Xie 2016
  • 10. Comparing video with image understanding He 2017 Classification Detection 4,800 categories 15.2% Top5 error Tens of categories ~10-20 mAP at 0.5 overlap Videos Images 1,000 categories* 3.1% Top5 error *Transfer learning widespread Hundreds of categories* ~60 mAP at 0.5 overlap Pixel-level segmentation *Transfer learning widespread Krizhevsky 2012, Xie 2016
  • 11. Comparing video with image understanding He 2017 Johnson 2016, Krause 2017 Classification Detection Captioning 4,800 categories 15.2% Top5 error Just getting started: Short clips, niche domains Videos Images 1,000 categories* 3.1% Top5 error *Transfer learning widespread Hundreds of categories* ~60 mAP at 0.5 overlap Pixel-level segmentation *Transfer learning widespread Dense captioning Coherent paragraphs Krizhevsky 2012, Xie 2016 Tens of categories ~10-20 mAP at 0.5 overlap
  • 12. Comparing video with image understanding He 2017 Johnson 2016, Krause 2017 Classification Detection Captioning 4,800 categories 15.2 Top5 error Just getting started: Short clips, niche domains Videos Beyond Images 1,000 categories* 3.1% Top5 error *Transfer learning widespread Hundreds of categories* ~60 mAP at 0.5 overlap Pixel-level segmentation *Transfer learning widespread Dense captioning Coherent paragraphs Significant work on question-answering — Krizhevsky 2012, Xie 2016 Yang 2016 Tens of categories ~10-20 mAP at 0.5 overlap
  • 13. The challenge of scale Training labels Inference Models Video processing is computationally expensive Video annotation is labor-intensive Temporal dimension adds complexity
  • 14. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 15. Input Output t = 0 t = T Running Task: Temporal action detection Talking Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 16. Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Efficient video processing
  • 17. t = 0 t = T Output Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Frame model Input: a frame Our model for efficient action detection
  • 18. t = 0 t = T [ ] Output Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Output: Detection instance [start, end] Next frame to glimpse Frame model Input: a frame Our model for efficient action detection
  • 19. t = 0 t = T Recurrent neural network (time information) [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Our model for efficient action detection
  • 20. t = 0 t = T Recurrent neural network (time information) [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Our model for efficient action detection
  • 21. t = 0 t = T Recurrent neural network (time information) [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Our model for efficient action detection
  • 22. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) [ ] [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 23. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) [ ] [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 24. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) [ ] [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 25. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) [ ] [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Output [ ] Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 26. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) [ ] [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Output [ ] … Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 27. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) Output Convolutional neural network (frame information) Output [ ] … Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Optional output: Detection instance [start, end] Output: Next frame to glimpse
  • 28. Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Our model for efficient action detection • Train differentiable outputs (detection output class and bounds) using standard backpropagation • Train non-differentiable outputs (where to look next, when to emit a prediction) using reinforcement learning (REINFORCE algorithm) • Achieves detection performance on par with dense sliding window- based approaches, while observing only 2% of frames
  • 29. Learned policy in action Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 30. Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Learned policy in action
  • 31. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 32. Dense action labeling Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 33. MultiTHUMOS • Extends the THUMOS’14 action detection dataset with dense, multilevel, frame-level action annotations for 30 hours across 400 videos THUMOS MultiTHUMOS Annotations 6,365 38,690 Classes 20 65 Density (labels / frame) 0.3 1.5 Classes per video 1.1 10.5 Max actions per frame 2 9 Max actions per video 3 25 Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 34. Modeling dense, multilabel actions • Need to reason about multiple potential actions simultaneously • High degree of temporal dependency • In standard recurrent models for action recognition, all state is in hidden layer representation • At each time step, makes prediction of current frame based on the current frame and previous hidden representation Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 35. MultiLSTM • Extension of LSTM that expands the temporal receptive field of input and output connections • Key idea: providing the model with more freedom in both reading input and writing output reduces the burden placed on the hidden layer representation Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 36. MultiLSTM Input video frames Frame class predictions t Standard LSTM …… Donahue 2014 Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 37. MultiLSTM Frame class predictions t …… Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Standard LSTM: Single input, single output Input video frames
  • 38. MultiLSTM Frame class predictions t …… Frame class predictions t MultiLSTM: Multiple inputs, multiple outputs …… Standard LSTM: Single input, single output Input video frames Input video frames Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 39. MultiLSTM Multiple Inputs (soft attention) Multiple Outputs (weighted average) Multilabel Loss Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 40. MultiLSTM Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 41. Retrieving sequential actions Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 42. Retrieving co-occurring actions Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 43. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei- Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 44. Labeling videos is expensive • Takes significantly longer to label a video than an image • If spatial or temporal bounds desired, even worse • How can we practically learn about new concepts in video?
  • 45. Web queries are a source of noisy video labels
  • 46. Image search is much cleaner!
  • 47. Can we effectively learn from the noisy web queries? • Our approach: learn how to selective positive training examples from noisy queries in order to train classifiers for new classes • Use a reinforcement learning-based formulation to learn a data labeling policy that achieves strong performance on a small, manually-labeled dataset of classes • Then use this policy to automatically label noisy web data for new classes Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 48. Balancing diversity vs. semantic drift • Want diverse training examples to improve classifier • But too much diversity can also lead to semantic drift • Our approach: balance diversity and drift by training labeling policies using an annotated reward set which the policy must successfully classify Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 49. Overview of approach Boomerang … Boomerang on a beach Boomerang music video Classifier Candidate web queries (YouTube autocomplete) Agent Label new positives + Boomerang on a beach Current positive set Update classifier Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017. Update state
  • 50. Overview of approach Boomerang … Boomerang on a beach Boomerang music video Classifier Candidate web queries (YouTube autocomplete) Agent Label new positives + Boomerang on a beach Current positive set Update classifier Fixed negative set Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017. Update state
  • 51. Overview of approach Boomerang … Boomerang on a beach Boomerang music video Classifier Candidate web queries (YouTube autocomplete) Agent Label new positives + Boomerang on a beach Current positive set Update classifier Update state Fixed negative set Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017. Training reward Eval on reward set
  • 52. Sports1M Greedy classifier Ours Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 53. Sports1M Greedy classifier Ours Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 54. Novel classes Greedy classifier Ours Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 55. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei- Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 56. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei- Fei. Learning to learn from noisy web videos. CVPR 2017. Learning to learn
  • 57. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei- Fei. Learning to learn from noisy web videos. CVPR 2017. Learning to learn Unsupervised learning
  • 58. Towards Knowledge Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Videos Knowledge of the dynamic visual world
  • 59. Collaborators Olga Russakovsky Mykhaylo Andriluka Ning Jin Vignesh Ramanathan Liyue Shen Greg Mori Fei-Fei Li