This paper provides an overview of the Verifying Multimedia Use task that takes places as part of the 2015 MediaEval Benchmark. The task deals with the automatic detection of manipulation and misuse of Web multimedia content. Its aim is to lay the basis for a future generation of tools that could assist media professionals in the process of verification. Examples of manipulation include maliciously tampering with images and videos, e.g., splicing, removal/addition of elements, while other kinds of misuse include the reposting of previously captured multimedia content in a different context (e.g., a new event) claiming that it was captured there. For the 2015 edition of the task, we have generated and made available a large corpus of real-world cases of images that were distributed through tweets, along with manually assigned labels regarding their use, i.e. misleading (fake) versus appropriate (real).
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
3. Real or Fake
#3
Real photo
captured April 2011 by WSJ
but
heavily tweeted during Hurricane Sandy
(29 Oct 2012)
Tweeted by multiple sources &
retweeted multiple times
Original online at:
http://blogs.wsj.com/metropolis/2011/04/28/weather-
journal-clouds-gathered-but-no-tornado-damage/
4. Task at a Glance
#4
TWEET
IMAGE
MEDIAEVAL
SYSTEM
FAKE
REAL
Systems may use:
• Tweet text
• Tweet metadata
• Twitter user profile
• Image content
AUTHOR
(PROFILE)
5. A Typology of Fake: Reposting of Real
• Photos from past events reposted as being
associated to current event
#5
6. A Typology of Fake: Reposting of Art
• Artworks presented as real imagery
#6
7. A Typology of Fake: Speculations
• Speculations regarding the association of persons or
actions to current event
#7
8. A Typology of Fake: Photoshopping
• Digitally manipulated photos
#8
12. Ground Truth Generation
• Data (tweet) collection
– Historic (known cases discussed online) using Topsy
– Real-time during major events using streaming API
• Tweet set expansion
– Near-duplicate image search + human inspection was used
to increase the number of associated tweets
• Label assignment
– Fake/real labels were manually assigned after consulting
online reports that were posted after each event
#12
13. Annotation Challenges
• Tweets declaring that
the embedded image is fake
• Tweets with obvious
manipulations
• All those cases were manually checked and removed
from both the development and test set!
#13
22. Future Plans
• Move beyond tweets + images
– Blog/news articles
– Public Facebook posts (in pages)
– Other?
• Move beyond the simple fake/real distinction
– Real, but inaccurate
– Messages expressing doubt
– Other?
• Use different evaluation measures
– AUC probably better especially when there is class
imbalance
#22