SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Social Data and Multimedia Analytics for News
and Events Applications
Dr. Yiannis Kompatsiaris, ikom@iti.gr
Multimedia, Knowledge and Social Media Analytics Lab, Head
CERTH-ITI
Multimodal Social Data Management (MSDM)
Workshop
MSDM 2014, Athens Social Data and Multimedia Analytics#2
Overview
• Introduction
– Motivation – Challenges
• SocialSensor Project and Use Cases
• Research Approaches
– Large-Scale visual search
– Clustering
– Verification
• Demos – Applications
– MM News Demo
– Clusttour
– Thessfest
• Conclusions
MSDM 2014, Athens Social Data and Multimedia Analytics#3
Introduction
Motivation
Example Applications
Conceptual Architecture
Challenges
MSDM 2014, Athens Social Data and Multimedia Analytics
www.puzzlemarketer.com/digital-social-brands-in-60-seconds/ (Apr, 2012)
MSDM 2014, Athens Social Data and Multimedia Analytics
Social Networks as Real-Life Sensors
• Social Networks is a data source with an
extremely dynamic nature that reflects
events and the evolution of community
focus (user’s interests)
• Huge smartphones and mobile devices
penetration provides real-time and
location-based user feedback
• Transform individually rare but
collectively frequent media to meaningful
topics, events, points of interest,
emotional states and social connections
• Present in an efficient way for a variety of
applications (news, marketing,
entertainment)
MSDM 2014, Athens Social Data and Multimedia Analytics#6
Pope Francis
Pope Benedict
2007: iPhone release
2008: Android release
2010: iPad release
http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
MSDM 2014, Athens Social Data and Multimedia Analytics
Social Networks as Graphs
10
social web as a graph
nodes=twi er users
edges=retweetson #jan25 hashtag
announcement of Mubarak’sresigna on
h p://gephi.org/2011/the-egyp an-revolu on-on-twi er/
MSDM 2014, Athens Social Data and Multimedia Analytics#8
Social Networks as Graphs
“Social networks have emergent
properties. Emergent properties
are new attributes of a whole that
arise from the interaction and
interconnection of the parts”
•Emotions, Health, Sexual
relationships do not depend just
on our connections (e.g. number
of them) but on our position -
structure in the social graph
– Central – Hub
– Outlier
– Transitivity (connections between
friends)
MSDM 2014, Athens Social Data and Multimedia Analytics
Examples - Science
Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and
Jiawei Han. The wisdom of social multimedia:
using flickr for prediction and forecast,
International conference on Multimedia (MM '10). ACM.
9
“…if you're more than 100 km away from the epicenter
[of an earthquake] you can read about the quake on
twitter before it hits you…”
MSDM 2014, Athens Social Data and Multimedia Analytics
Example – News (Boston bombing)
#10
“Following the Boston Marathon bombings, one quarter of
Americans reportedly looked to Facebook, Twitter and
other social networking sites for information, according to
The Pew Research Center. When the Boston Police
Department posted its final “CAPTURED!!!” tweet of the
manhunt, more than 140,000 people retweeted it.”
“Authorities have recognized that one the first
places people go in events like this is to social
media, to see what the crowd is saying about what
to do next”
"I have been following my friend's
Facebook [account] who is near the scene
and she is updating everyone before it
even gets to the news”
MSDM 2014, Athens Social Data and Multimedia Analytics
Events - Festivals
#11
http://www.eventmanagerblog.com/uploads/2012/12/event-technology-infographic.jpg
MSDM 2014, Athens Social Data and Multimedia Analytics
API Wrapper
Website Wrapper
Scheduler
CRAWLING
Visual Indexing
Near-duplicates
Text Indexing
INDEXING
Media Fetcher
SNA
Sentiment - Influence
Trends - Topics
MINING
Model Building
Concepts
Relevance
Diversity
Popularity
RANKING
Veracity
Crawling Specs
Sources
Interaction
Responsiveness
Aggregation
VISUALIZATION
Aesthetics
Conceptual Architecture
MSDM 2014, Athens Social Data and Multimedia Analytics
Challenges – Content (Mining)
• Multi-modality: e.g. image + tags
• Rich social context: spatio-temporal, social connections,
relations and social graph
• Inconsistent quality: noise, spam, ambiguity, fake,
propaganda
• Huge volume: Massively produced and disseminated
• Multi-source: may be generated by different applications
and user communities
• Also connected to other sources (e.g. LOD, web)
• Dynamic: Fast updates, real-time
MSDM 2014, Athens Social Data and Multimedia Analytics
Policy – Licensing – Legal challenges
• Fragmented access to data
– Separate wrappers/APIs for each source (Twitter, Facebook, etc.)
– Different data collection/crawling policies
• Limitations imposed by API providers (“Walled Gardens”)
• Full access to data impossible or extremely expensive (e.g. see data
licensing plans for GNIP and DataSift
• Non-transparent data access practices (e.g. access is provided to an
organization/person if they have a contact in Twitter)
• Constant change of model and ToS of social APIs
– No backwards compatibility, additional development costs
• Ephemeral nature of content
• Social search results often lead to removed content  inconsistent
and unreliable referencing
• User Privacy & Purpose of use
• Fuzzy regulatory framework regarding mining user-contributed data
MSDM 2014, Athens Social Data and Multimedia Analytics#15
Social Sensor Project
Use Cases
MSDM 2014, Athens Social Data and Multimedia Analytics
SocialSensor Project Objective
SocialSensor quickly surfaces trusted and relevant material
from social media – with context.
DySCODySCO
behaviou
r
location
timecontent
usage
social context
Massive social media
and unstructured web
Social media mining
Aggregation & indexing
News - Infotainment
Personalised access
Ad-hoc P2P networks
MSDM 2014, Athens Social Data and Multimedia Analytics#17
The SocialSensor Vision
SocialSensor quickly surfaces trusted and relevant
material from social media – with context.
•“quickly”: in real time
•“surfaces”: automatically discovers, clusters and searches
•“trusted”: automatic support in verification process
•“relevant”: to the users, personalized
•“material”: any material (text, image, audio, video =
multimedia), aggregated with other sources (e.g. web)
•“social media”: across all relevant social media platforms
•“with context”: location, time, sentiment, influence
MSDM 2014, Athens Social Data and Multimedia Analytics#18
Conceptual Architecture and Main components
SEMANTIC MIDDLEWARE
Public
Data
In-project
Data
SEARCH & RECOMMENDATION
USER MODELLING & PRESENTATION
INDEXINGMINING
STORAGE
DATA COLLECTION / CRAWLING
• Real time dynamic topic
and event clustering
• Trend, popularity
and sentiment analysis
• Calculate trust/influence
scores around people
• Personalized search,
access & presentation
based on social network
interactions
• Semantic enrichment
and discovery of services
MSDM 2014, Athens Social Data and Multimedia Analytics
Use Cases
Casual News
application
Casual News Readers
Professional
News application
Journalists, Editors, etc.
NEWS
EventLiveDashboard
Festival organizers
INFOTAINMENT
Social Media Walls
Festival attendants
MSDM 2014, Athens Social Data and Multimedia Analytics#20
“It has changed the way we do
news”(MSN)
“Social media is the key place for emerging stories –
internationally, nationally, locally” (BBC)
“Social media is transforming the way we do journalism”
(New York Times)
Source: picture alliance / dpa
MSDM 2014, Athens Social Data and Multimedia Analytics#21
Source: Getty Images
“It’s really hard to find the nuggets of useful stuff
in an ocean of content” (BBC)
“Things that aren’t relevant crowd out the content
you are looking for” (MSN)
“The filters aren’t configurable
enough” (CNN)
MSDM 2014, Athens Social Data and Multimedia Analytics
Verification was simpler in the past...
Source: Frank Grätz
#22
MSDM 2014, Athens Social Data and Multimedia Analytics#23
Infotainment
• Events with large numbers
of visitors
• Thessaloniki International
Film Festival
– 80,000 viewers / 100,000
visitors in 10 days
– 150 films, 350 screenings
• Discovery and presentation
of relevant aggregated
social media
– Trending Topics
– Sentiment
– Tweet – film matching
– Visualization (Social Walls)
MSDM 2014, Athens Social Data and Multimedia Analytics#24
Research Approaches
Large-Scale Visual Search
Clustering – Community Detection
Social Media Verification
MSDM 2014, Athens Social Data and Multimedia Analytics#25
Scalable visual feature aggregation &
indexing
• Problem: Example-based image search
– Find images that represent same or similar object or scene
with a given query image
– Viewed from different viewpoints, occlusions, clutter
• Challenge: Large-scale
– Searching databases with tens of millions of images
– Objectives to be full-filed:
• Sufficient discriminative power
• Fast response times
• Efficient memory usage
MSDM 2014, Athens Social Data and Multimedia Analytics#26
Large-scale visual search
image collection
from social media/
Web
image local feature
extraction
feature aggregation
feature indexingkNN visual
similarity search
concept-based
image annotation
image clustering
image (geo)tagging
concept-based
search/filtering
duplicate detection
MSDM 2014, Athens Social Data and Multimedia Analytics#27
Framework
• Implementation and evaluation of the effectiveness
of VLAD in combination with SURF
• Scalable image indexing
E. Spyromitros-Xioufis, et al. An Empirical Study on the
Combination of SURF Features with VLAD Vectors for Image
Search. In WIAMIS 2012, Dublin, Ireland, May 2012.
image
local
descriptor
extraction
descriptor
aggregation
dimensionality
reductionset of local
descriptors
fixed size
vector
encoding &
indexing
low dimensional
vector
SIFT / SURF BOW / VLAD PCA
PQ + ADC/IVFADC
MSDM 2014, Athens Social Data and Multimedia Analytics#28
Scalable indexing of features
• ADC 16x8 requires 16 bytes per image
– ~67M images per GB
• IVFADC requires 4 additional bytes per image
– ~53.6M images per GB
• In current implementation we achieve only half of above numbers due to
using short int[] instead of byte[], but possible to improve.
• Ideally, 1 billion images could be indexed on a server with
20GB of RAM (projection).
• Query time (for 1M vectors):
– Exhaustive search of VLAD vectors (d’=128): 0.50 sec
– Product Quantization with ADC 16x8: 0.10 sec (x5 faster)
– Product Quantization with IVFADC 16x8: 0.02 sec (x25 faster)
MSDM 2014, Athens Social Data and Multimedia Analytics#29
VLAD+SIFT vs. VLAD+SURF
Accuracy vs. dimensionality
• VLAD+SURF improves VLAD+SIFT and FV+SIFT across all dimensions in
both Holidays and Oxford datasets
Results in rows starting with * are taken from Jégou et al., 2011, hence the missing values for some entries.
SIFT corresponds to PCA reduced SIFT which yielded better results than standard SIFT in Jegou et al., 2011
MSDM 2014, Athens Social Data and Multimedia Analytics
Large-scale graph-based clustering
• Problem: Discover
structure in large-scale
datasets by exploiting
their relations
• Challenges - Approach:
– Large-scale
– Fast response times
– Efficient memory usage
– Noise Resilient
– Number of clusters not
known
• Structural similarity +
local expansion
community detection
techniques
MSDM 2014, Athens Social Data and Multimedia Analytics
• Structural similarity + Local
expansion
(highly efficient and scalable approach)
• Not necessary to know the number
of clusters
• Noise resilient
(not all nodes need to be part of a
community)
• Generic approach adaptable to
many applications
(depending on node – edge
representation)
+
S. Papadopoulos, Y. Kompatsiaris, A. Vakali. “A Graph-based Clustering Scheme for Identifying Related Tags in Folksonomies”.
In Proceedings of DaWaK'10, Springer-Verlag, 65-76
Large-scale graph-based clustering
MSDM 2014, Athens Social Data and Multimedia Analytics
Computational Verification in Social Media
• Create a computational verification framework to
classify tweets with unreliable media content.
• Events used for experimentation
#32
Fake images posted during Hurricane Sandy natural disaster Fake images posted during Boston Marathon bombings
MSDM 2014, Athens Social Data and Multimedia Analytics
Methodology
#33
MSDM 2014, Athens Social Data and Multimedia Analytics
Results
• Tweet Statistics
• Approaches
#34
Tweets with URLs 343939
Tweets with fake images 10758
Tweets with real images 3540
Hurricane Sandy Boston Marathon
Tweets with URLs 112449
Tweets with fake images 281
Tweets with real images 460
Classifier Classified correctly(%)
Content
features
User
features
Total
features
J48 tree 81.41 67.72 80.68
KStar 81.28 71.16 81.38
Random
Forest
80.59 70.15 80.94
Detection accuracy using cross – validation approach
Classifier Classified correctly(%)
Content
features
User
features
Total
features
J48 tree 76.45 70.81 81.25
KStar 81.28 74.12 75.78
Random
Forest
78.59 76.15 79.10
Hurricane Sandy Boston Marathon
MSDM 2014, Athens Social Data and Multimedia Analytics
Results(2)
#35
Classifier Classified correctly(%)
Content
features
User
features
Total
features
J48 tree 73.79 51.06 65.06
KStar 75.30 62.29 53.31
Random
Forest
74.02 63.10 65.96
Detection accuracy using different training and testing set in Hurricane Sandy
Classifier Classified correctly(%)
Content
features
User
features
Total
features
J48 tree 55.05 50.12 54.10
KStar 50.01 50.10 50.97
Random
Forest
58.75 51.03 58.78
Detection accuracy using Hurricane Sandy for training and Boston Marathon for testing
MSDM 2014, Athens Social Data and Multimedia Analytics#36
Other approaches
• Graph-based multimodal clustering for social event
detection in large collections of images
– automatic organization of a multimedia collection into
groups of items, each (group) of which corresponds to a
distinct event.
• Unsupervised concept learning detection using social
media as training data
• Text analysis for entities matching and sentiment
analysis
• Placing images based on content-features
• Retrieving diverse images for same entity
MSDM 2014, Athens Social Data and Multimedia Analytics#37
Demos - Applications
MM News Demo
Clusttour
ThesFest
MSDM 2014, Athens Social Data and Multimedia Analytics
Multimedia Demo
MSDM 2014, Athens Social Data and Multimedia Analytics#39
Multimedia Demo Architecture
#39
StreamManager
Twitter Facebook Flickr YouTube RSS Instagram
160.xx.xx.207
MongoDBWrapper
160.xx.xx.207
TextIndexer (Solr)
160.xx.xx.207
160.xx.xx.207
MediaFetcher, FeatureExtractor (HDFS)
160.xx.xx.58 160.xx.xx.107
Social Focused Crawler (HDFS)
160.xx.xx.187
Nutch
Nutch VLAD
FeatureIndexer (HDFS)
160.xx.xx.207
IVFADC
Data Mining
160.xx.xx.191
Visual Clust. Geo Clust. Statistics
Web server
160.xx.xx.116
API (3)API (4)
API (1) API (2)
MSDM 2014, Athens Social Data and Multimedia Analytics
tags: sagrada familia,
cathedral, barcelona
taken: 12 May 2009
lat: 41.4036, lon: 2.1743
PHOTOS & METADATA
SPATIAL CLUSTERING + TEMPORAL ANALYSIS
COMMUNITY DETECTION
CLASSIFICATION TO LANDMARKS/EVENTS
VISUAL
TAG
HYBRID
[2 years, 50 users / 120 photos]
#users / #photos
duration
[1 day, 2 users / 10 photos]
S. Papadopoulos, C. Zigkolis, Y. Kompatsiaris, A. Vakali. “Cluster-based Landmark and Event Detection on Tagged Photo
Collections”. In IEEE Multimedia Magazine 18(1), pp. 52-63, 2011
City profile creation (Clusttour)
MSDM 2014, Athens Social Data and Multimedia Analytics#41
City profile creation (Clusttour)
Community detection on
image similarity graphs
Nodes: photos
Edges: visual and tag
similarity
MSDM 2014, Athens Social Data and Multimedia Analytics
MSDM 2014, Athens Social Data and Multimedia Analytics#43
ThessFest
• Thessaloniki
International Film
Festival
• Support
twitter/comment
usage within the
app
• Ratings and
comments per film
• Feedback
aggregation
• Votes
• Tweets
• Real-time feedback
to the organisation
and visitors
ThessFest
MSDM 2014, Athens Social Data and Multimedia Analytics
Fête de la Musique Berlin app
• FETEberlin in App Store and Google Play
• More than 100K visitors
• About 5K musicians
• More than 5K app downloads, 25K
sessions
App features
•Browse and filter detailed program
•Interactive maps and routing
•Social Sharing
•Artists’ and Stages Details
•Social Monitoring
Main benefits for attendants
•Visitors can browse through maps and
don’t get lost as stages are numerous
•Event schedule is available always and per
stage
– Very useful when the server was down and
there was no access to the online schedule
#44
MSDM 2014, Athens Social Data and Multimedia Analytics#45
Topic analysis
• Top-10 topics
• Manual inspection
of clusters:
– 53.8% of topic titles
considered
informative
– 98.5% of clusters
were found to be
“clean”
• Topics in time
MSDM 2014, Athens Social Data and Multimedia Analytics
Other Application Areas
• Science
– Sociology, machine learning (machine as a teacher), computer vision
(annotation)
• Tourism – Leisure – Culture
– Off-the-beaten path POI extraction
• Marketing
– Brand monitoring, personalised ads
• Prediction
– Politics: election results
• News
– Topics, trends event detection
• Others
– Environment, emergency response, energy saving, etc
MSDM 2014, Athens Social Data and Multimedia Analytics
Conclusions – Further topics
• Social media data useful in many applications
• Not all data always available (e.g. User queries, fb)
– Infrastructure
– Policy - Privacy issues
• Real-time and scalable approaches
– Efficiency of semantics and analysis vs. performance vs. infrastructure
• Fusion of various modalities
– Content, social, temporal, location
• Verification & Linking other sources (web, Linked Open Data)
• Visualization - Interfaces
• Applications and commercialization
• User engagement
MSDM 2014, Athens Social Data and Multimedia Analytics
Reusable results
• Starting point: http://www.socialsensor.eu/results
– Deliverables
– Publications
– Datasets
– Software
– e-letter: http://stcsn.ieee.net/e-letter/vol-1-no-3
• Open-source projects (Apache License v2):
https://github.com/socialsensor
– Data collection (stream-manager, storm-focused-crawler)
– Indexing (framework-client, multimedia-indexing)
– Mining (topic-detection, multimedia-analysis, community-evolution-
analysis, social-event-detection)
MSDM 2014, Athens Social Data and Multimedia Analytics
European Centre for Social Media
• Topics
– Social media analytics
– Verification
– Visualisation
– Applications in different domains
• Activities
– Listings of project, results, institutions, events
– Community building
– Support/organise events
– Common social media presence (e.g. LinkedIn)
– Funding from subscriptions, training, commercialisation
– Supporting projects: SocialSensor, Reveal, MULTISENSOR, PHEME,
DecarboNet, MWCC, uComp,
– Website: http://www.socialmediacentre.eu/
– Research-academic: STCSN http://stcsn.ieee.net/
MSDM 2014, Athens Social Data and Multimedia Analytics
Contributions from
• Dr. Symeon Papadopoulos
• Leading R&D in Social Media Mining
• Large-Scale visual search
• Community detection – Clusttour
• Dr. Sotirios Diplaris
• SocialSensor Technical Project Manager
• Lefteris Spyromitros (PhD Student, AUTH)
• Large-Scale visual search
• Christina Boididou
• Social Media Verification
• Lazaros Apostolidis
• Visualization - User Interface MM News Dem0
• Manos Schinas
• Topic Analysis
• Back-end Thessfest – Clusttour
• MM News Demo
• Juxhin Bakalli
• iOS Applications development (ThessFest - Clusttour)
• Antonis Latas
• Android Application Development (Thessfest)
Thank you for your attention!
ikom@iti.gr
http://mklab.iti.gr

Weitere ähnliche Inhalte

Was ist angesagt?

Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Digital Methods Initiative
 

Was ist angesagt? (20)

‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging Site
 
Social Media Analysis: Present and Future
Social Media Analysis: Present and FutureSocial Media Analysis: Present and Future
Social Media Analysis: Present and Future
 
Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...
 
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA TalkSocial Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
 
Peace on Facebook? Problematising social media as spaces for intergroup conta...
Peace on Facebook? Problematising social media as spaces for intergroup conta...Peace on Facebook? Problematising social media as spaces for intergroup conta...
Peace on Facebook? Problematising social media as spaces for intergroup conta...
 
Social media mining hicss 46 part 1
Social media mining   hicss 46 part 1Social media mining   hicss 46 part 1
Social media mining hicss 46 part 1
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
 
Information Architecture & Findability
Information Architecture & FindabilityInformation Architecture & Findability
Information Architecture & Findability
 
Filter bubble and information behaviour, ISIC 2018, keynote speech
Filter bubble and information behaviour, ISIC 2018, keynote speechFilter bubble and information behaviour, ISIC 2018, keynote speech
Filter bubble and information behaviour, ISIC 2018, keynote speech
 
30 Tools and Tips to Speed Up Your Digital Workflow
30 Tools and Tips to Speed Up Your Digital Workflow 30 Tools and Tips to Speed Up Your Digital Workflow
30 Tools and Tips to Speed Up Your Digital Workflow
 
What Actor-Network Theory (ANT) and digital methods can do for data journalis...
What Actor-Network Theory (ANT) and digital methods can do for data journalis...What Actor-Network Theory (ANT) and digital methods can do for data journalis...
What Actor-Network Theory (ANT) and digital methods can do for data journalis...
 
Social Network Analysis and Partnerships SNA presentation Guevara 2015
Social Network Analysis and Partnerships SNA presentation Guevara 2015Social Network Analysis and Partnerships SNA presentation Guevara 2015
Social Network Analysis and Partnerships SNA presentation Guevara 2015
 
Social Media Issue Publics in Australia
Social Media Issue Publics in AustraliaSocial Media Issue Publics in Australia
Social Media Issue Publics in Australia
 
From #qldfloods to #sandy: Engaging with the Public during Crisis Events
From #qldfloods to #sandy: Engaging with the Public during Crisis EventsFrom #qldfloods to #sandy: Engaging with the Public during Crisis Events
From #qldfloods to #sandy: Engaging with the Public during Crisis Events
 
2053951715611145
20539517156111452053951715611145
2053951715611145
 
Meraz
MerazMeraz
Meraz
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
How to get started with Data Journalism
How to get started with Data JournalismHow to get started with Data Journalism
How to get started with Data Journalism
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
 

Ähnlich wie Social Data and Multimedia Analytics for News and Events Applications

COSMOS
COSMOSCOSMOS
COSMOS
NSMNSS
 
Social Media Metrics for the Cultural Heritage sector
Social Media Metrics for the Cultural Heritage sectorSocial Media Metrics for the Cultural Heritage sector
Social Media Metrics for the Cultural Heritage sector
HU-Crossmedialab
 

Ähnlich wie Social Data and Multimedia Analytics for News and Events Applications (20)

Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)
 
Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
New Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataNew Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter Data
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
COSMOS
COSMOSCOSMOS
COSMOS
 
Picturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolPicturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter School
 
Social Media Metrics for the Cultural Heritage sector
Social Media Metrics for the Cultural Heritage sectorSocial Media Metrics for the Cultural Heritage sector
Social Media Metrics for the Cultural Heritage sector
 
interacting with social media content about events
interacting with social media content about eventsinteracting with social media content about events
interacting with social media content about events
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis?
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis Informatics
 
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slidesMining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
 
Open analytics social media framework
Open analytics   social media frameworkOpen analytics   social media framework
Open analytics social media framework
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...
 

Mehr von Yiannis Kompatsiaris

Social Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionSocial Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event Detection
Yiannis Kompatsiaris
 
Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012
Yiannis Kompatsiaris
 

Mehr von Yiannis Kompatsiaris (14)

AI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI sessionAI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI session
 
Social media mining for sensing and responding to real-world trends and events
Social media mining for sensing and responding to real-world trends and eventsSocial media mining for sensing and responding to real-world trends and events
Social media mining for sensing and responding to real-world trends and events
 
Sensor Based Ambient Assisted Living
Sensor Based Ambient Assisted LivingSensor Based Ambient Assisted Living
Sensor Based Ambient Assisted Living
 
Social Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionSocial Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event Detection
 
Social Media Verification Challenges, Approaches and Applications
Social Media Verification  Challenges, Approaches and ApplicationsSocial Media Verification  Challenges, Approaches and Applications
Social Media Verification Challenges, Approaches and Applications
 
Processing Large Complex Data
Processing Large Complex DataProcessing Large Complex Data
Processing Large Complex Data
 
The DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaThe DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with Dementia
 
Dem@care Project Short Overview
Dem@care Project Short OverviewDem@care Project Short Overview
Dem@care Project Short Overview
 
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ..."Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
 
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
 
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ... Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – RegionsImprove My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
 
Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012
 
Social media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applicationsSocial media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applications
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Social Data and Multimedia Analytics for News and Events Applications

  • 1. Social Data and Multimedia Analytics for News and Events Applications Dr. Yiannis Kompatsiaris, ikom@iti.gr Multimedia, Knowledge and Social Media Analytics Lab, Head CERTH-ITI Multimodal Social Data Management (MSDM) Workshop
  • 2. MSDM 2014, Athens Social Data and Multimedia Analytics#2 Overview • Introduction – Motivation – Challenges • SocialSensor Project and Use Cases • Research Approaches – Large-Scale visual search – Clustering – Verification • Demos – Applications – MM News Demo – Clusttour – Thessfest • Conclusions
  • 3. MSDM 2014, Athens Social Data and Multimedia Analytics#3 Introduction Motivation Example Applications Conceptual Architecture Challenges
  • 4. MSDM 2014, Athens Social Data and Multimedia Analytics www.puzzlemarketer.com/digital-social-brands-in-60-seconds/ (Apr, 2012)
  • 5. MSDM 2014, Athens Social Data and Multimedia Analytics Social Networks as Real-Life Sensors • Social Networks is a data source with an extremely dynamic nature that reflects events and the evolution of community focus (user’s interests) • Huge smartphones and mobile devices penetration provides real-time and location-based user feedback • Transform individually rare but collectively frequent media to meaningful topics, events, points of interest, emotional states and social connections • Present in an efficient way for a variety of applications (news, marketing, entertainment)
  • 6. MSDM 2014, Athens Social Data and Multimedia Analytics#6 Pope Francis Pope Benedict 2007: iPhone release 2008: Android release 2010: iPad release http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
  • 7. MSDM 2014, Athens Social Data and Multimedia Analytics Social Networks as Graphs 10 social web as a graph nodes=twi er users edges=retweetson #jan25 hashtag announcement of Mubarak’sresigna on h p://gephi.org/2011/the-egyp an-revolu on-on-twi er/
  • 8. MSDM 2014, Athens Social Data and Multimedia Analytics#8 Social Networks as Graphs “Social networks have emergent properties. Emergent properties are new attributes of a whole that arise from the interaction and interconnection of the parts” •Emotions, Health, Sexual relationships do not depend just on our connections (e.g. number of them) but on our position - structure in the social graph – Central – Hub – Outlier – Transitivity (connections between friends)
  • 9. MSDM 2014, Athens Social Data and Multimedia Analytics Examples - Science Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social multimedia: using flickr for prediction and forecast, International conference on Multimedia (MM '10). ACM. 9 “…if you're more than 100 km away from the epicenter [of an earthquake] you can read about the quake on twitter before it hits you…”
  • 10. MSDM 2014, Athens Social Data and Multimedia Analytics Example – News (Boston bombing) #10 “Following the Boston Marathon bombings, one quarter of Americans reportedly looked to Facebook, Twitter and other social networking sites for information, according to The Pew Research Center. When the Boston Police Department posted its final “CAPTURED!!!” tweet of the manhunt, more than 140,000 people retweeted it.” “Authorities have recognized that one the first places people go in events like this is to social media, to see what the crowd is saying about what to do next” "I have been following my friend's Facebook [account] who is near the scene and she is updating everyone before it even gets to the news”
  • 11. MSDM 2014, Athens Social Data and Multimedia Analytics Events - Festivals #11 http://www.eventmanagerblog.com/uploads/2012/12/event-technology-infographic.jpg
  • 12. MSDM 2014, Athens Social Data and Multimedia Analytics API Wrapper Website Wrapper Scheduler CRAWLING Visual Indexing Near-duplicates Text Indexing INDEXING Media Fetcher SNA Sentiment - Influence Trends - Topics MINING Model Building Concepts Relevance Diversity Popularity RANKING Veracity Crawling Specs Sources Interaction Responsiveness Aggregation VISUALIZATION Aesthetics Conceptual Architecture
  • 13. MSDM 2014, Athens Social Data and Multimedia Analytics Challenges – Content (Mining) • Multi-modality: e.g. image + tags • Rich social context: spatio-temporal, social connections, relations and social graph • Inconsistent quality: noise, spam, ambiguity, fake, propaganda • Huge volume: Massively produced and disseminated • Multi-source: may be generated by different applications and user communities • Also connected to other sources (e.g. LOD, web) • Dynamic: Fast updates, real-time
  • 14. MSDM 2014, Athens Social Data and Multimedia Analytics Policy – Licensing – Legal challenges • Fragmented access to data – Separate wrappers/APIs for each source (Twitter, Facebook, etc.) – Different data collection/crawling policies • Limitations imposed by API providers (“Walled Gardens”) • Full access to data impossible or extremely expensive (e.g. see data licensing plans for GNIP and DataSift • Non-transparent data access practices (e.g. access is provided to an organization/person if they have a contact in Twitter) • Constant change of model and ToS of social APIs – No backwards compatibility, additional development costs • Ephemeral nature of content • Social search results often lead to removed content  inconsistent and unreliable referencing • User Privacy & Purpose of use • Fuzzy regulatory framework regarding mining user-contributed data
  • 15. MSDM 2014, Athens Social Data and Multimedia Analytics#15 Social Sensor Project Use Cases
  • 16. MSDM 2014, Athens Social Data and Multimedia Analytics SocialSensor Project Objective SocialSensor quickly surfaces trusted and relevant material from social media – with context. DySCODySCO behaviou r location timecontent usage social context Massive social media and unstructured web Social media mining Aggregation & indexing News - Infotainment Personalised access Ad-hoc P2P networks
  • 17. MSDM 2014, Athens Social Data and Multimedia Analytics#17 The SocialSensor Vision SocialSensor quickly surfaces trusted and relevant material from social media – with context. •“quickly”: in real time •“surfaces”: automatically discovers, clusters and searches •“trusted”: automatic support in verification process •“relevant”: to the users, personalized •“material”: any material (text, image, audio, video = multimedia), aggregated with other sources (e.g. web) •“social media”: across all relevant social media platforms •“with context”: location, time, sentiment, influence
  • 18. MSDM 2014, Athens Social Data and Multimedia Analytics#18 Conceptual Architecture and Main components SEMANTIC MIDDLEWARE Public Data In-project Data SEARCH & RECOMMENDATION USER MODELLING & PRESENTATION INDEXINGMINING STORAGE DATA COLLECTION / CRAWLING • Real time dynamic topic and event clustering • Trend, popularity and sentiment analysis • Calculate trust/influence scores around people • Personalized search, access & presentation based on social network interactions • Semantic enrichment and discovery of services
  • 19. MSDM 2014, Athens Social Data and Multimedia Analytics Use Cases Casual News application Casual News Readers Professional News application Journalists, Editors, etc. NEWS EventLiveDashboard Festival organizers INFOTAINMENT Social Media Walls Festival attendants
  • 20. MSDM 2014, Athens Social Data and Multimedia Analytics#20 “It has changed the way we do news”(MSN) “Social media is the key place for emerging stories – internationally, nationally, locally” (BBC) “Social media is transforming the way we do journalism” (New York Times) Source: picture alliance / dpa
  • 21. MSDM 2014, Athens Social Data and Multimedia Analytics#21 Source: Getty Images “It’s really hard to find the nuggets of useful stuff in an ocean of content” (BBC) “Things that aren’t relevant crowd out the content you are looking for” (MSN) “The filters aren’t configurable enough” (CNN)
  • 22. MSDM 2014, Athens Social Data and Multimedia Analytics Verification was simpler in the past... Source: Frank Grätz #22
  • 23. MSDM 2014, Athens Social Data and Multimedia Analytics#23 Infotainment • Events with large numbers of visitors • Thessaloniki International Film Festival – 80,000 viewers / 100,000 visitors in 10 days – 150 films, 350 screenings • Discovery and presentation of relevant aggregated social media – Trending Topics – Sentiment – Tweet – film matching – Visualization (Social Walls)
  • 24. MSDM 2014, Athens Social Data and Multimedia Analytics#24 Research Approaches Large-Scale Visual Search Clustering – Community Detection Social Media Verification
  • 25. MSDM 2014, Athens Social Data and Multimedia Analytics#25 Scalable visual feature aggregation & indexing • Problem: Example-based image search – Find images that represent same or similar object or scene with a given query image – Viewed from different viewpoints, occlusions, clutter • Challenge: Large-scale – Searching databases with tens of millions of images – Objectives to be full-filed: • Sufficient discriminative power • Fast response times • Efficient memory usage
  • 26. MSDM 2014, Athens Social Data and Multimedia Analytics#26 Large-scale visual search image collection from social media/ Web image local feature extraction feature aggregation feature indexingkNN visual similarity search concept-based image annotation image clustering image (geo)tagging concept-based search/filtering duplicate detection
  • 27. MSDM 2014, Athens Social Data and Multimedia Analytics#27 Framework • Implementation and evaluation of the effectiveness of VLAD in combination with SURF • Scalable image indexing E. Spyromitros-Xioufis, et al. An Empirical Study on the Combination of SURF Features with VLAD Vectors for Image Search. In WIAMIS 2012, Dublin, Ireland, May 2012. image local descriptor extraction descriptor aggregation dimensionality reductionset of local descriptors fixed size vector encoding & indexing low dimensional vector SIFT / SURF BOW / VLAD PCA PQ + ADC/IVFADC
  • 28. MSDM 2014, Athens Social Data and Multimedia Analytics#28 Scalable indexing of features • ADC 16x8 requires 16 bytes per image – ~67M images per GB • IVFADC requires 4 additional bytes per image – ~53.6M images per GB • In current implementation we achieve only half of above numbers due to using short int[] instead of byte[], but possible to improve. • Ideally, 1 billion images could be indexed on a server with 20GB of RAM (projection). • Query time (for 1M vectors): – Exhaustive search of VLAD vectors (d’=128): 0.50 sec – Product Quantization with ADC 16x8: 0.10 sec (x5 faster) – Product Quantization with IVFADC 16x8: 0.02 sec (x25 faster)
  • 29. MSDM 2014, Athens Social Data and Multimedia Analytics#29 VLAD+SIFT vs. VLAD+SURF Accuracy vs. dimensionality • VLAD+SURF improves VLAD+SIFT and FV+SIFT across all dimensions in both Holidays and Oxford datasets Results in rows starting with * are taken from Jégou et al., 2011, hence the missing values for some entries. SIFT corresponds to PCA reduced SIFT which yielded better results than standard SIFT in Jegou et al., 2011
  • 30. MSDM 2014, Athens Social Data and Multimedia Analytics Large-scale graph-based clustering • Problem: Discover structure in large-scale datasets by exploiting their relations • Challenges - Approach: – Large-scale – Fast response times – Efficient memory usage – Noise Resilient – Number of clusters not known • Structural similarity + local expansion community detection techniques
  • 31. MSDM 2014, Athens Social Data and Multimedia Analytics • Structural similarity + Local expansion (highly efficient and scalable approach) • Not necessary to know the number of clusters • Noise resilient (not all nodes need to be part of a community) • Generic approach adaptable to many applications (depending on node – edge representation) + S. Papadopoulos, Y. Kompatsiaris, A. Vakali. “A Graph-based Clustering Scheme for Identifying Related Tags in Folksonomies”. In Proceedings of DaWaK'10, Springer-Verlag, 65-76 Large-scale graph-based clustering
  • 32. MSDM 2014, Athens Social Data and Multimedia Analytics Computational Verification in Social Media • Create a computational verification framework to classify tweets with unreliable media content. • Events used for experimentation #32 Fake images posted during Hurricane Sandy natural disaster Fake images posted during Boston Marathon bombings
  • 33. MSDM 2014, Athens Social Data and Multimedia Analytics Methodology #33
  • 34. MSDM 2014, Athens Social Data and Multimedia Analytics Results • Tweet Statistics • Approaches #34 Tweets with URLs 343939 Tweets with fake images 10758 Tweets with real images 3540 Hurricane Sandy Boston Marathon Tweets with URLs 112449 Tweets with fake images 281 Tweets with real images 460 Classifier Classified correctly(%) Content features User features Total features J48 tree 81.41 67.72 80.68 KStar 81.28 71.16 81.38 Random Forest 80.59 70.15 80.94 Detection accuracy using cross – validation approach Classifier Classified correctly(%) Content features User features Total features J48 tree 76.45 70.81 81.25 KStar 81.28 74.12 75.78 Random Forest 78.59 76.15 79.10 Hurricane Sandy Boston Marathon
  • 35. MSDM 2014, Athens Social Data and Multimedia Analytics Results(2) #35 Classifier Classified correctly(%) Content features User features Total features J48 tree 73.79 51.06 65.06 KStar 75.30 62.29 53.31 Random Forest 74.02 63.10 65.96 Detection accuracy using different training and testing set in Hurricane Sandy Classifier Classified correctly(%) Content features User features Total features J48 tree 55.05 50.12 54.10 KStar 50.01 50.10 50.97 Random Forest 58.75 51.03 58.78 Detection accuracy using Hurricane Sandy for training and Boston Marathon for testing
  • 36. MSDM 2014, Athens Social Data and Multimedia Analytics#36 Other approaches • Graph-based multimodal clustering for social event detection in large collections of images – automatic organization of a multimedia collection into groups of items, each (group) of which corresponds to a distinct event. • Unsupervised concept learning detection using social media as training data • Text analysis for entities matching and sentiment analysis • Placing images based on content-features • Retrieving diverse images for same entity
  • 37. MSDM 2014, Athens Social Data and Multimedia Analytics#37 Demos - Applications MM News Demo Clusttour ThesFest
  • 38. MSDM 2014, Athens Social Data and Multimedia Analytics Multimedia Demo
  • 39. MSDM 2014, Athens Social Data and Multimedia Analytics#39 Multimedia Demo Architecture #39 StreamManager Twitter Facebook Flickr YouTube RSS Instagram 160.xx.xx.207 MongoDBWrapper 160.xx.xx.207 TextIndexer (Solr) 160.xx.xx.207 160.xx.xx.207 MediaFetcher, FeatureExtractor (HDFS) 160.xx.xx.58 160.xx.xx.107 Social Focused Crawler (HDFS) 160.xx.xx.187 Nutch Nutch VLAD FeatureIndexer (HDFS) 160.xx.xx.207 IVFADC Data Mining 160.xx.xx.191 Visual Clust. Geo Clust. Statistics Web server 160.xx.xx.116 API (3)API (4) API (1) API (2)
  • 40. MSDM 2014, Athens Social Data and Multimedia Analytics tags: sagrada familia, cathedral, barcelona taken: 12 May 2009 lat: 41.4036, lon: 2.1743 PHOTOS & METADATA SPATIAL CLUSTERING + TEMPORAL ANALYSIS COMMUNITY DETECTION CLASSIFICATION TO LANDMARKS/EVENTS VISUAL TAG HYBRID [2 years, 50 users / 120 photos] #users / #photos duration [1 day, 2 users / 10 photos] S. Papadopoulos, C. Zigkolis, Y. Kompatsiaris, A. Vakali. “Cluster-based Landmark and Event Detection on Tagged Photo Collections”. In IEEE Multimedia Magazine 18(1), pp. 52-63, 2011 City profile creation (Clusttour)
  • 41. MSDM 2014, Athens Social Data and Multimedia Analytics#41 City profile creation (Clusttour) Community detection on image similarity graphs Nodes: photos Edges: visual and tag similarity
  • 42. MSDM 2014, Athens Social Data and Multimedia Analytics
  • 43. MSDM 2014, Athens Social Data and Multimedia Analytics#43 ThessFest • Thessaloniki International Film Festival • Support twitter/comment usage within the app • Ratings and comments per film • Feedback aggregation • Votes • Tweets • Real-time feedback to the organisation and visitors ThessFest
  • 44. MSDM 2014, Athens Social Data and Multimedia Analytics Fête de la Musique Berlin app • FETEberlin in App Store and Google Play • More than 100K visitors • About 5K musicians • More than 5K app downloads, 25K sessions App features •Browse and filter detailed program •Interactive maps and routing •Social Sharing •Artists’ and Stages Details •Social Monitoring Main benefits for attendants •Visitors can browse through maps and don’t get lost as stages are numerous •Event schedule is available always and per stage – Very useful when the server was down and there was no access to the online schedule #44
  • 45. MSDM 2014, Athens Social Data and Multimedia Analytics#45 Topic analysis • Top-10 topics • Manual inspection of clusters: – 53.8% of topic titles considered informative – 98.5% of clusters were found to be “clean” • Topics in time
  • 46. MSDM 2014, Athens Social Data and Multimedia Analytics Other Application Areas • Science – Sociology, machine learning (machine as a teacher), computer vision (annotation) • Tourism – Leisure – Culture – Off-the-beaten path POI extraction • Marketing – Brand monitoring, personalised ads • Prediction – Politics: election results • News – Topics, trends event detection • Others – Environment, emergency response, energy saving, etc
  • 47. MSDM 2014, Athens Social Data and Multimedia Analytics Conclusions – Further topics • Social media data useful in many applications • Not all data always available (e.g. User queries, fb) – Infrastructure – Policy - Privacy issues • Real-time and scalable approaches – Efficiency of semantics and analysis vs. performance vs. infrastructure • Fusion of various modalities – Content, social, temporal, location • Verification & Linking other sources (web, Linked Open Data) • Visualization - Interfaces • Applications and commercialization • User engagement
  • 48. MSDM 2014, Athens Social Data and Multimedia Analytics Reusable results • Starting point: http://www.socialsensor.eu/results – Deliverables – Publications – Datasets – Software – e-letter: http://stcsn.ieee.net/e-letter/vol-1-no-3 • Open-source projects (Apache License v2): https://github.com/socialsensor – Data collection (stream-manager, storm-focused-crawler) – Indexing (framework-client, multimedia-indexing) – Mining (topic-detection, multimedia-analysis, community-evolution- analysis, social-event-detection)
  • 49. MSDM 2014, Athens Social Data and Multimedia Analytics European Centre for Social Media • Topics – Social media analytics – Verification – Visualisation – Applications in different domains • Activities – Listings of project, results, institutions, events – Community building – Support/organise events – Common social media presence (e.g. LinkedIn) – Funding from subscriptions, training, commercialisation – Supporting projects: SocialSensor, Reveal, MULTISENSOR, PHEME, DecarboNet, MWCC, uComp, – Website: http://www.socialmediacentre.eu/ – Research-academic: STCSN http://stcsn.ieee.net/
  • 50. MSDM 2014, Athens Social Data and Multimedia Analytics Contributions from • Dr. Symeon Papadopoulos • Leading R&D in Social Media Mining • Large-Scale visual search • Community detection – Clusttour • Dr. Sotirios Diplaris • SocialSensor Technical Project Manager • Lefteris Spyromitros (PhD Student, AUTH) • Large-Scale visual search • Christina Boididou • Social Media Verification • Lazaros Apostolidis • Visualization - User Interface MM News Dem0 • Manos Schinas • Topic Analysis • Back-end Thessfest – Clusttour • MM News Demo • Juxhin Bakalli • iOS Applications development (ThessFest - Clusttour) • Antonis Latas • Android Application Development (Thessfest)
  • 51. Thank you for your attention! ikom@iti.gr http://mklab.iti.gr

Hinweis der Redaktion

  1. Benefits: (i) Intelligent extraction of objects and events from the social Web, (ii) multimodal indexing and organization, (iii) personalized access and presentation of content (incl. media delivery and caching), and (iv) concrete and real integration of the social dimension of the current Web.
  2. ----- Besprechungsnotizen (03.04.12 14:41) ----- In the course of the project we have interviewed a considerable number of journalists and executives from some of the worlds biggest media outlets like CNN, the BBC, The New York Times and others... Here are some of the quotes. 3x klicken (bis alle 3 Quotes sichtbar). But journalists are not only describing the positive side. There are also huge challenges. And you can see from the following slide what is most challenging...
  3. ----- Besprechungsnotizen (03.04.12 16:44) ----- Or we can turn it the other way round: We have a known source whom the reporter trusts...
  4. These are all candidate sources for collecting data for the system. Ideally, we should try to have at least one source per medium (micro-blogging, photos, videos) + Facebook. Check-ins could be also valuable especially for the WP8 use case.
  5. Partners should indicate whether they can make additional data available to the consortium.