Visual Storytelling (NAACL 2016, Poster)

•Download as PPTX, PDF•

1 like•736 views

We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling. The first release of this dataset, SIND1 v.1, includes 81,743 unique photos in 20,211 sequences, aligned to both descriptive (caption) and story language. We establish several strong baselines for the storytelling task, and motivate an automatic metric to benchmark progress. Modelling concrete description as well as figurative and social language, as provided in this dataset and the storytelling task, has the potential to move artificial intelligence from basic understandings of typical visual scenes towards more and more human-like understanding of grounded event structure and subjective expression.

Science

A black frisbee is
sitting on top of a
roof.
A man playing
soccer outside of a
white house with a
red door.
The boy is
throwing a soccer
ball by the red
door.
A soccer ball is
over a roof by a
frisbee in a rain
gutter.
Two balls and a
frisbee are on top
of a roof.
A discus got
stuck up on the
roof.
Why not try
getting it down
with a soccer
ball?
Up the soccer
ball goes.
It didn't work so
we tried a volley
ball.
Now the discus,
soccer ball, and
volleyball are all
stuck on the roof.
*Ting-Hao (Kenneth) Huang1, *Francis Ferraro2, Nasrin Mostafazadeh3, Ishan Misra1, Jacob Devlin6, Aishwarya Agrawal4, Ross Girshick5,
Xiaodong He6, Pushmeet Kohli6, Dhruv Batra4, Larry Zitnick5, Devi Parikh5, Lucy Vanderwende6, Michel Galley6 and Margaret Mitchell6
1 Carnegie Mellon University, 2 Johns Hopkins University, 3 University of Rochester, 4 Virginia Tech, 5 Facebook AI Research, 6 Microsoft Research
Stories ≠ Consecutive Captions ≠ Descriptive TextMotivation
Text/Image
Pairs (K)
Vocab
Size (K)
Words/Sent.
Web Ppl.
(30B words)
Brown
(comparison
only)
52.1
(text only)
47.7 20.8 194.0
DII
Description-in-
isolation
151.8 13.8 11.0 147.0
SIS
Stories-in-
sequence
252.9 18.2 10.2 116.0
Getting Humans to Tell Stories
Peason’s r
BLEU 0.08
SkipThoughts 0.18
METEOR 0.22
This is a picture of a family. This is a picture of a cake. This is a picture of a dog.
This is a picture of a beach. This is a picture of a beach.
The family gathered together for a meal. The food was delicious. The dog was excited
to be there. The dog was enjoying the water. The dog was happy to be in the water.
The family gathered together for a meal. The food was delicious. The dog was excited
to be there. The kids were playing in the water. The boat was a little too much to drink.
The family got together for a cookout. They had a lot of delicious food. The dog
was happy to be there. They had a great time on the beach. They even had a
swim in the water.
Greedy
Stories
-Dups
+Grounded
Caption
Output
A solid next move in Artiﬁcial Intelligence is to go beyond basic
description of visual scenes towards human-like understanding of
grounded event structure and subjective expression. We introduce the
ﬁrst dataset for sequential vision-to-language and explore how
modeling concrete description as well as ﬁgurative and social language
enables visual storytelling. Our data is at sind.ai.
Get Better Stories with Uniqueness & Visually Grounded Constraints
DIISIS
Automatic Evaluation and Results
See our paper for the description-in-sequence tiers (DIS) and more!
We define 80-5-5-10 train-dev-validation-test splits for all three tiers.
Data
Analysis
Beam
= 10
Beam
= 1
-
Dups
+
Grounded
DII 23.55 19.10 19.21 ----
SIS 23.13 27.76 30.11 31.42
All values are statistically significant (< 1e-5).
Correlations of automatic scores
against human judgments on 3K
random SIS training stories.
METEOR scores on the validation
split, using a sequence-to-sequence
NN with gated recurrent units. Conclusion
Visual Storytelling
Flickr
Album
Description for
Images
in Isolation
&
in Sequences
Story 1
Storytelling
Story 2
Story 3
Re-telling
Preferred Photo
Sequence
Story 4
Story 5
Several strong baselines for the task of visual storytelling demonstrate that intelligent machines
can now begin to generate inferential, conceptual, and evaluative language to share humanlike
experience. METEOR serves as an automatic metric for evaluation, best correlated with
human descriptions. Much more work to be done: Combining a fully grounded model with a
model free to dream yields the best automatically generated stories to date.

Viewers also liked

Story Of Noahwwwilma

The Ugly Ducklingbanares

Postman pat textual analysischloeharrisoon

Digital StorytellingJennifer Dorman

Moral Stories For The YoungOH TEIK BIN

StoryTelling - The Ugly DucklingAEC-Inglês

Ungrateful Twins...A Moral Story For ChildrenOH TEIK BIN

Short Motivational StoriesCA. Abhishek Zaware

The Boy and The Apple Tree...A Touching StoryOH TEIK BIN

Storytelling in 2014Gary Vaynerchuk

Viewers also liked (10)

Story Of Noah

The Ugly Duckling

Postman pat textual analysis

Digital Storytelling

Moral Stories For The Young

StoryTelling - The Ugly Duckling

Ungrateful Twins...A Moral Story For Children

Short Motivational Stories

The Boy and The Apple Tree...A Touching Story

Storytelling in 2014

Similar to Visual Storytelling (NAACL 2016, Poster)

Apprendre par le jeu Antoine Taly

Learning with gamesAntoine Taly

MADLat 2014 keynoteRichard Van Eck

Apprendre par le jeu diu ilumens 2016Antoine Taly

Jewish Education Project 1 of 3Global Kids

Project:Filter Design and ExperimentationAbertay University

Gaming and Learning: Play as a Way of LearningGail Matthews-DeNatale

GLS - Herrodaniherro

Similar to Visual Storytelling (NAACL 2016, Poster) (8)

Apprendre par le jeu

Learning with games

MADLat 2014 keynote

Apprendre par le jeu diu ilumens 2016

Jewish Education Project 1 of 3

Project:Filter Design and Experimentation

Gaming and Learning: Play as a Way of Learning

GLS - Herro

Recently uploaded

Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi

Formation of low mass protostars and their circumstellar disksSérgio Sacani

Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani

Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju

COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed

Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls

Forensic Biology & Its biological significance.pdfrohankumarsinghrore1

Seismic Method Estimate velocity from seismic data.pptxAlMamun560346

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari

FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson

GBSN - Microbiology (Unit 1)Areesha Ahmad

9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1

Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293

Recently uploaded (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.

Formation of low mass protostars and their circumstellar disks

Botany 4th semester file By Sumit Kumar yadav.pdf

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b

Pests of cotton_Sucking_Pests_Dr.UPR.pdf

COST ESTIMATION FOR A RESEARCH PROJECT.pptx

Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified

Forensic Biology & Its biological significance.pdf

Seismic Method Estimate velocity from seismic data.pptx

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency

Presentation Vikram Lander by Vedansh Gupta.pptx

GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry

GBSN - Microbiology (Unit 1)

9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service

Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...

Visual Storytelling (NAACL 2016, Poster)

1. A black frisbee is sitting on top of a roof. A man playing soccer outside of a white house with a red door. The boy is throwing a soccer ball by the red door. A soccer ball is over a roof by a frisbee in a rain gutter. Two balls and a frisbee are on top of a roof. A discus got stuck up on the roof. Why not try getting it down with a soccer ball? Up the soccer ball goes. It didn't work so we tried a volley ball. Now the discus, soccer ball, and volleyball are all stuck on the roof. *Ting-Hao (Kenneth) Huang1, *Francis Ferraro2, Nasrin Mostafazadeh3, Ishan Misra1, Jacob Devlin6, Aishwarya Agrawal4, Ross Girshick5, Xiaodong He6, Pushmeet Kohli6, Dhruv Batra4, Larry Zitnick5, Devi Parikh5, Lucy Vanderwende6, Michel Galley6 and Margaret Mitchell6 1 Carnegie Mellon University, 2 Johns Hopkins University, 3 University of Rochester, 4 Virginia Tech, 5 Facebook AI Research, 6 Microsoft Research Stories ≠ Consecutive Captions ≠ Descriptive TextMotivation Text/Image Pairs (K) Vocab Size (K) Words/Sent. Web Ppl. (30B words) Brown (comparison only) 52.1 (text only) 47.7 20.8 194.0 DII Description-in- isolation 151.8 13.8 11.0 147.0 SIS Stories-in- sequence 252.9 18.2 10.2 116.0 Getting Humans to Tell Stories Peason’s r BLEU 0.08 SkipThoughts 0.18 METEOR 0.22 This is a picture of a family. This is a picture of a cake. This is a picture of a dog. This is a picture of a beach. This is a picture of a beach. The family gathered together for a meal. The food was delicious. The dog was excited to be there. The dog was enjoying the water. The dog was happy to be in the water. The family gathered together for a meal. The food was delicious. The dog was excited to be there. The kids were playing in the water. The boat was a little too much to drink. The family got together for a cookout. They had a lot of delicious food. The dog was happy to be there. They had a great time on the beach. They even had a swim in the water. Greedy Stories -Dups +Grounded Caption Output A solid next move in Artificial Intelligence is to go beyond basic description of visual scenes towards human-like understanding of grounded event structure and subjective expression. We introduce the first dataset for sequential vision-to-language and explore how modeling concrete description as well as figurative and social language enables visual storytelling. Our data is at sind.ai. Get Better Stories with Uniqueness & Visually Grounded Constraints DIISIS Automatic Evaluation and Results See our paper for the description-in-sequence tiers (DIS) and more! We define 80-5-5-10 train-dev-validation-test splits for all three tiers. Data Analysis Beam = 10 Beam = 1 - Dups + Grounded DII 23.55 19.10 19.21 ---- SIS 23.13 27.76 30.11 31.42 All values are statistically significant (< 1e-5). Correlations of automatic scores against human judgments on 3K random SIS training stories. METEOR scores on the validation split, using a sequence-to-sequence NN with gated recurrent units. Conclusion Visual Storytelling Flickr Album Description for Images in Isolation & in Sequences Story 1 Storytelling Story 2 Story 3 Re-telling Preferred Photo Sequence Story 4 Story 5 Several strong baselines for the task of visual storytelling demonstrate that intelligent machines can now begin to generate inferential, conceptual, and evaluative language to share humanlike experience. METEOR serves as an automatic metric for evaluation, best correlated with human descriptions. Much more work to be done: Combining a fully grounded model with a model free to dream yields the best automatically generated stories to date.

Visual Storytelling (NAACL 2016, Poster)

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Visual Storytelling (NAACL 2016, Poster)

Similar to Visual Storytelling (NAACL 2016, Poster) (8)

Recently uploaded

Recently uploaded (20)

Visual Storytelling (NAACL 2016, Poster)