2. 2
Deep Semantic Feature
Sentence Sentence
Embedding
Video
Embedding
Web Images
Embedding Space
Video
“A baby is playing a guitar.”
Image Search
Deep Semantic Feature
3. •
- Xu et al., “Show, attend and tell: Neural image caption generation
with visual attention,” in Proc. ICML 2015.
•
- Grave, Wayne, et al., “Hybrid computing using a neural network with
dynamic external memory,” Nature, vol. 2538, pp.471—476, 2016.
• Adversarial Examples
- Goodfellow, et al., “Exmpaining and harnessing adversarial
examples,” in Proc. ICLR 2015.
3
4. • Xu, Ba, Kiros, Cho, Courville, Salakhutdinov, Zemel, and Bengio
“Show, attend and tell: Neural image caption generation with visual attention”
Proc. ICML 2015
17. • Microsoft Research Video Description Corpus
• > 2000 Video and descriptions
• TVD: a reproducible and multiply aligned TV series dataset
• Big Bang Theory Games of Thrones
• MSR VTT
• > 1M video and description pairs
• MPII Movie Description Dataset
• > 100K clip and description pairs
• YouTube 8M
•
• SumMe
• TVSum
• UG Video Dataset
17