3. Highlight
• Task: dense-captioning events
• Dataset: ActivityNet Captions
• Events range across multiple time scales and can even overlap.
• generating action proposals to multi-scale detection of events,
processes each video in a forward pass to detect events as they occur
• Events in a given video are usually related to one another.
• introduce a captioning module that utilizes the context from all the
events from our proposal module to generate each sentence
6. Method V. Escorcia, F. C. Heilbron, J. C. Niebles, and B. Ghanem.
Daps: Deep action proposals for action understanding.
2016,ECCV
J. Johnson, A.
Karpathy, and L.
Fei-Fei.
DenseCap:
Fully
convolutional
localization
networks for
dense
captioning.
A. Alahi, K. Goel, V.
Ramanathan, A.
Robicquet, L. Fei-
Fei,
and S. Savarese.
Social lstm: Human
trajectory prediction
in
crowded spaces.
object-centric
in images
action-centric
in videos