SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Every Picture Tells a Story: Generating Sentences from Images Ali Farhadi, MohsenHejrati, Mohammad AminSadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth Proceedings of ECCV-2010
Motivation Demonstrating how good automatic methods can correlate a description to a given image or obtain images that illustrate a given sentence.  Auto-annotation
Motivation Demonstrating how good automatic methods can correlate a description to a given image or obtain images that illustrate a given sentence.  Auto-illustration
Contributions Proposes a system to compute score linking of an image to a sentence and vice versa. Evaluates their methodology on a novel dataset consisting of human-annotated images. (PASCAL Sentence Dataset)  Quantitative evaluation on the quality of the predictions.
Overview
The Approach Mapping Image to Meaning 16 23 29 Predicting the triplet of an image involves solving a small multi-label Markov random field.
The Approach Node potentials:  Computed as a linear combination of scores from several detectors and classifiers. (feature functions) Edge potentials: Edge potentials are estimated by the frequencies of the node labels.
The Approach Image Space Feature Functions:  Node features, Similarity Features  To provide information about the nodes on the MRF we first need to construct image features: Node Features:  ,[object Object]
Hoiem et al. classification responses
Gist-based scene classification responses,[object Object]
 Average of the node features over KNN neighbors in the training set to the test image by matching those node features derived from classifiers and detectors:,[object Object]
The normalized frequency of the word B in our corpus, f(B).
The normalized frequency of (A and B) at the same time, f(A, B).
f(A,B)/(f(A)f(B)),[object Object]
Learning and Inference  Learning to predict triplets for images is done discriminatively using a dataset of images labeled with their meaning triplets.  The potentials are computed as linear combinations of feature functions.  This makes the learning problem as searching for the best set of weights on the linear combination of feature functions so that the ground truth triplets score higher than any other triplet. Inference involves finding argmaxywTφ(x, y) where φ is the potential function, y is the triplet label, and w are the learned weights.
Evaluation Dataset PASCAL Sentence Dataset: Pascal 2008 development kit. 50 images from 20 categories Amazon’s Mechanical Turk generate 5 captions for each image. Experimental Settings 600 training images and 400 testing images. 50 closest triplets for matching
Evaluation Scoring a match between images and sentences is done by ranking them in opposite spaces and summing over them weighed by inverse rank of the triplets. Distributional Semantics Usage: Text Information and Similarity measure is used to take care of out of vocabulary words that occurs in sentences but are not being learnt by a detector/classifier.
Evaluation Quantitative Measures Tree-F1 measure:A measure that reflects two important interacting components, accuracy and specificity.  	Precision is defined as the total number of edges on the path that matches the edges on the ground truth path divided by the total number of edges on the ground truth path. 	Recall is the total number of edges on the predicted path which is in the ground truth path divided by the total number of edges in the path. BLUE Measure: A measure to check if the triplet we generate is logically valid or not. For e.g., (bottle, walk, street) is not valid. For that, we check if the triplet ever appeared in our corpus or not.
Results Auto -Annotation

Weitere ähnliche Inhalte

Was ist angesagt?

110726IGARSS_MIL.pptx
110726IGARSS_MIL.pptx110726IGARSS_MIL.pptx
110726IGARSS_MIL.pptx
grssieee
 

Was ist angesagt? (19)

Lec10 matching
Lec10 matchingLec10 matching
Lec10 matching
 
Text detection and recognition from natural scenes
Text detection and recognition from natural scenesText detection and recognition from natural scenes
Text detection and recognition from natural scenes
 
Detecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width TransformDetecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width Transform
 
Text Detection Strategies
Text Detection StrategiesText Detection Strategies
Text Detection Strategies
 
Syllabus ms
Syllabus msSyllabus ms
Syllabus ms
 
論文紹介:Movie Plot Analysis via Turning Point Identification
論文紹介:Movie Plot Analysis via Turning Point Identification論文紹介:Movie Plot Analysis via Turning Point Identification
論文紹介:Movie Plot Analysis via Turning Point Identification
 
Image to text Converter
Image to text ConverterImage to text Converter
Image to text Converter
 
Template matching03
Template matching03Template matching03
Template matching03
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 
VTU CBCS E&C 5th sem Information theory and coding(15EC54) Module -1notes
VTU CBCS E&C 5th sem Information theory and coding(15EC54) Module -1notesVTU CBCS E&C 5th sem Information theory and coding(15EC54) Module -1notes
VTU CBCS E&C 5th sem Information theory and coding(15EC54) Module -1notes
 
gilbert_iccv11_paper
gilbert_iccv11_papergilbert_iccv11_paper
gilbert_iccv11_paper
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
Text Extraction from Image using Python
Text Extraction from Image using PythonText Extraction from Image using Python
Text Extraction from Image using Python
 
VTU CBCS E&C 5th sem Information theory and coding(15EC54) Module -3 notes
VTU CBCS E&C 5th sem Information theory and coding(15EC54) Module -3 notesVTU CBCS E&C 5th sem Information theory and coding(15EC54) Module -3 notes
VTU CBCS E&C 5th sem Information theory and coding(15EC54) Module -3 notes
 
Attacks on Victim Model! A Defense Strategy
Attacks on Victim Model! A Defense StrategyAttacks on Victim Model! A Defense Strategy
Attacks on Victim Model! A Defense Strategy
 
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
 
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
 
110726IGARSS_MIL.pptx
110726IGARSS_MIL.pptx110726IGARSS_MIL.pptx
110726IGARSS_MIL.pptx
 
IRJET- Devnagari Text Detection
IRJET- Devnagari Text DetectionIRJET- Devnagari Text Detection
IRJET- Devnagari Text Detection
 

Ähnlich wie Sentence generation

Survey on Supervised Method for Face Image Retrieval Based on Euclidean Dist...
Survey on Supervised Method for Face Image Retrieval  Based on Euclidean Dist...Survey on Supervised Method for Face Image Retrieval  Based on Euclidean Dist...
Survey on Supervised Method for Face Image Retrieval Based on Euclidean Dist...
Editor IJCATR
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
Dario Panada
 

Ähnlich wie Sentence generation (20)

search engine for images
search engine for imagessearch engine for images
search engine for images
 
M.Phil Computer Science Image Processing Projects
M.Phil Computer Science Image Processing ProjectsM.Phil Computer Science Image Processing Projects
M.Phil Computer Science Image Processing Projects
 
M.Phil Computer Science Image Processing Projects
M.Phil Computer Science Image Processing ProjectsM.Phil Computer Science Image Processing Projects
M.Phil Computer Science Image Processing Projects
 
M.E Computer Science Image Processing Projects
M.E Computer Science Image Processing ProjectsM.E Computer Science Image Processing Projects
M.E Computer Science Image Processing Projects
 
Object class recognition by unsupervide scale invariant learning - kunal
Object class recognition by unsupervide scale invariant learning - kunalObject class recognition by unsupervide scale invariant learning - kunal
Object class recognition by unsupervide scale invariant learning - kunal
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
 
Behavior study of entropy in a digital image through an iterative algorithm
Behavior study of entropy in a digital image through an iterative algorithmBehavior study of entropy in a digital image through an iterative algorithm
Behavior study of entropy in a digital image through an iterative algorithm
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
Survey on Supervised Method for Face Image Retrieval Based on Euclidean Dist...
Survey on Supervised Method for Face Image Retrieval  Based on Euclidean Dist...Survey on Supervised Method for Face Image Retrieval  Based on Euclidean Dist...
Survey on Supervised Method for Face Image Retrieval Based on Euclidean Dist...
 
semeval2016
semeval2016semeval2016
semeval2016
 
Detection, Rectification and Segmentation of Coplanar Repeated Patterns
Detection, Rectification and Segmentation of Coplanar Repeated PatternsDetection, Rectification and Segmentation of Coplanar Repeated Patterns
Detection, Rectification and Segmentation of Coplanar Repeated Patterns
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
Analyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et WekaAnalyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et Weka
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
IEEE ICAPR 2009
IEEE ICAPR 2009IEEE ICAPR 2009
IEEE ICAPR 2009
 
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
 

Mehr von Debaleena Chattopadhyay

Touchless Interaction from an Embodied Perspective
Touchless Interaction from an Embodied PerspectiveTouchless Interaction from an Embodied Perspective
Touchless Interaction from an Embodied Perspective
Debaleena Chattopadhyay
 
Experimental evaluation of five methods for collecting emotions in field sett...
Experimental evaluation of five methods for collecting emotions in field sett...Experimental evaluation of five methods for collecting emotions in field sett...
Experimental evaluation of five methods for collecting emotions in field sett...
Debaleena Chattopadhyay
 
Keeping things in context a comparative evaluation of focus plus context scre...
Keeping things in context a comparative evaluation of focus plus context scre...Keeping things in context a comparative evaluation of focus plus context scre...
Keeping things in context a comparative evaluation of focus plus context scre...
Debaleena Chattopadhyay
 
Supporting mobility for the blind a broad lit review
Supporting mobility for the blind   a broad lit reviewSupporting mobility for the blind   a broad lit review
Supporting mobility for the blind a broad lit review
Debaleena Chattopadhyay
 

Mehr von Debaleena Chattopadhyay (10)

Trusted Drug-Drug Interaction Alerts: From Critique to Collaboration
Trusted Drug-Drug Interaction Alerts: From Critique to CollaborationTrusted Drug-Drug Interaction Alerts: From Critique to Collaboration
Trusted Drug-Drug Interaction Alerts: From Critique to Collaboration
 
Touchless Interaction from an Embodied Perspective
Touchless Interaction from an Embodied PerspectiveTouchless Interaction from an Embodied Perspective
Touchless Interaction from an Embodied Perspective
 
Touchless Circular Menus
Touchless Circular MenusTouchless Circular Menus
Touchless Circular Menus
 
Think aloud protocol a reflection
Think aloud protocol  a reflectionThink aloud protocol  a reflection
Think aloud protocol a reflection
 
Experimental evaluation of five methods for collecting emotions in field sett...
Experimental evaluation of five methods for collecting emotions in field sett...Experimental evaluation of five methods for collecting emotions in field sett...
Experimental evaluation of five methods for collecting emotions in field sett...
 
Keeping things in context a comparative evaluation of focus plus context scre...
Keeping things in context a comparative evaluation of focus plus context scre...Keeping things in context a comparative evaluation of focus plus context scre...
Keeping things in context a comparative evaluation of focus plus context scre...
 
Supporting mobility for the blind a broad lit review
Supporting mobility for the blind   a broad lit reviewSupporting mobility for the blind   a broad lit review
Supporting mobility for the blind a broad lit review
 
Defocus magnification
Defocus magnificationDefocus magnification
Defocus magnification
 
Estimating natural illumination from a single outdoor scene final
Estimating natural illumination from a single outdoor scene   finalEstimating natural illumination from a single outdoor scene   final
Estimating natural illumination from a single outdoor scene final
 
Exploiting Hierarchical Context on a Large Database of Object Categories
Exploiting Hierarchical Context on a Large Database of Object Categories Exploiting Hierarchical Context on a Large Database of Object Categories
Exploiting Hierarchical Context on a Large Database of Object Categories
 

Sentence generation

  • 1. Every Picture Tells a Story: Generating Sentences from Images Ali Farhadi, MohsenHejrati, Mohammad AminSadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth Proceedings of ECCV-2010
  • 2. Motivation Demonstrating how good automatic methods can correlate a description to a given image or obtain images that illustrate a given sentence. Auto-annotation
  • 3. Motivation Demonstrating how good automatic methods can correlate a description to a given image or obtain images that illustrate a given sentence. Auto-illustration
  • 4. Contributions Proposes a system to compute score linking of an image to a sentence and vice versa. Evaluates their methodology on a novel dataset consisting of human-annotated images. (PASCAL Sentence Dataset) Quantitative evaluation on the quality of the predictions.
  • 6. The Approach Mapping Image to Meaning 16 23 29 Predicting the triplet of an image involves solving a small multi-label Markov random field.
  • 7. The Approach Node potentials: Computed as a linear combination of scores from several detectors and classifiers. (feature functions) Edge potentials: Edge potentials are estimated by the frequencies of the node labels.
  • 8.
  • 9. Hoiem et al. classification responses
  • 10.
  • 11.
  • 12. The normalized frequency of the word B in our corpus, f(B).
  • 13. The normalized frequency of (A and B) at the same time, f(A, B).
  • 14.
  • 15. Learning and Inference Learning to predict triplets for images is done discriminatively using a dataset of images labeled with their meaning triplets. The potentials are computed as linear combinations of feature functions. This makes the learning problem as searching for the best set of weights on the linear combination of feature functions so that the ground truth triplets score higher than any other triplet. Inference involves finding argmaxywTφ(x, y) where φ is the potential function, y is the triplet label, and w are the learned weights.
  • 16. Evaluation Dataset PASCAL Sentence Dataset: Pascal 2008 development kit. 50 images from 20 categories Amazon’s Mechanical Turk generate 5 captions for each image. Experimental Settings 600 training images and 400 testing images. 50 closest triplets for matching
  • 17. Evaluation Scoring a match between images and sentences is done by ranking them in opposite spaces and summing over them weighed by inverse rank of the triplets. Distributional Semantics Usage: Text Information and Similarity measure is used to take care of out of vocabulary words that occurs in sentences but are not being learnt by a detector/classifier.
  • 18. Evaluation Quantitative Measures Tree-F1 measure:A measure that reflects two important interacting components, accuracy and specificity. Precision is defined as the total number of edges on the path that matches the edges on the ground truth path divided by the total number of edges on the ground truth path. Recall is the total number of edges on the predicted path which is in the ground truth path divided by the total number of edges in the path. BLUE Measure: A measure to check if the triplet we generate is logically valid or not. For e.g., (bottle, walk, street) is not valid. For that, we check if the triplet ever appeared in our corpus or not.
  • 22.
  • 23. The intermediate meaning space in the model helps in approaching the two-way problem as well as is benefitted by the distributional semantics.
  • 24. The way to output a score and quantitatively evaluate the co-relation of description and images seems interesting.