Social media has changed the way we communicate. Social media data capture our social interactions and utterances in machine readable format. Searching and analysing massive and frequently updated social media data brings significant and diverse rewards across many different application domains, from politics and business to social science and epidemiology.A notable proportion of social media data comes with explicit or implicit spatial annotations, and almost all social media data has temporal metadata. We view social media data as a constant stream of data points, each containing text with spatial and temporal con-texts. We identify challenges relevant to each context, which we intend to subject to context aware querying and analysis, specifically including longitudinal analyses on social media archives, spatial keyword search, local intent search, and spatio-temporal intent search. Finally, for each context, emerging applications and further avenues for investigation are discussed.
2. Evolution of communication
Functional utterances
Vowels
Velar closure: consonants
Speech
New modality: writing
Increased
Digital text
machine-
?
E-mail readable
Social media
information
3. Social Media = Big Data
Gartner ''3V'' definition:
1.Volume
2.Velocity
3.Variety
High volume & velocity of messages:
Twitter has ~20 000 000 users per month
They write ~500 000 000 messages per day
Massive variety:
Stock markets;
Earthquakes;
Social arrangements;
… Bieber
4. What is machine-readable now?
Messages now contain
- not only linguistic content
- but also:
Links (e.g. URI)
Topic markers (e.g. hashtags)
Meta-information
What kind of meta-information?
User profile (including home location)
Images
Messages replied to
Message language
Time of message
Location of message
5. What resources do we have now?
Large, content-rich, linked, digital streams of human communication
We transfer knowledge via communication
Sampling communication gives a sample of human knowledge
''You've only done that which you can communicate''
The metadata (time – place – imagery) gives a richer resource:
→ A sampling of human behaviour
6. What can we do with this resource?
Context increases the data's richness
Increased richness enables novel applications
Time and Place are interesting parts of message context
1.What kinds of applications are there?
2.What are the practical challenges?
8. Historical search
Ability to retrieve from archives: Longitudinal query mode 0
Retrieve information on:
● Lifecycle of socially connected groups
● Analyse precursors to events, post-hoc
2008 2011
0. Weikum et al. 2011: ''Longitudinal analytics on web archive data: It’s about time'', Proc. CIDR
9. Historical search
Retrospective analyses into cause and effect
''There's a dead crow
in my garden''
Social media mentions of dead crows predict WNV in humans 1
1. Sugumaran & Voss 2012: ''Real-time spatio-temporal analysis of West Nile Virus using Twitter Data'', Proc.
Int'l conference on Computing for Geospatial Research and Applications
10. Emerging search
Data emerging at high velocity:
185 000 documents per minute
Gives a high temporal density
Search over this info enables:
● Live coverage of events
●
Realtime identification of emerging events 2
2. Cohen at al. 2011: ''Computational journalism: A call to arms to database researchers'', Proc. CIDR
11. Temporal indexing
What are our requirements?
● High-frequency document creation
● Temporal cross-sections of varying size
● Time-sensitive TF/IDF: stopwords are fluid
How can we do this? - Open challenge
● Tree indexing hard to distribute
● Maybe with adaptive multi-resolution grids?
12. Spatial Context
Demand for spatial information:
20% of all Google searches
53% of Bing mobile searches
Heterogeneous spatial context sources
GPS locations (most reliable)
Origin bounding boxes (e.g. city)
User profile text??? 3
Author's friends' locations 4
3. Hecht at al. 2011: ''Tweets from Justin Bieber’s Heart: The Dynamics of the “Location” Field in User
Profiles'', Proc. ACM CHI ; 4. Rout et al. 2013: ''Where's @wally? A Graph Based Method for Geolocating
Users in Social Networks'', Proc. ACM Hypertext
13. Spatial Keyword Search
How can we query a set of social media messages?
Treat as a a set of objects, each having
Text
Location
Query parameters:
Query text
Query location
Given query and set of messages, rank by similarity:
Text similarity (Cosine, Siamese Learning Net, Oriented PCA)
Separating distance (Haversine, Manhattan, Eco-routed)
Blend this with balancing coeff
(just like conventional spatial keyword search)
14. Spatial Keyword Search
Query: E
''good bar in north copenhagen''
B
Issued from location
Five candidate messages A C
Query region established
D
Rank by blend of location and textual similarity
Message loca text
A So drunk last night at @BarSyv 0.7 0.6
B Out shoe shopping!!! #louboutintime 0.9 0.0
C Who pays $9 for a beer?! 0.6 0.5
D wow found cph's greatest cocktail bar lol 0.1 1.0
E Traffic. Traffic everywhere. Need a drink. 0.4 0.2
15. Continuous Spatial Queries
Social media scenario characterised by:
Streaming data
New spatial objects constantly appearing
Two new spatial keyword query types:
Static Continuous (SCSKQ)
- Fixed query location
- Tracks newly appearing objects
Moving Continuous (MCSKQ)
- Query location transits locus
- Result updated with new objects
Novel part: fresh objects continuously introduced
16. Location Diversity
Location data unreliable
Reliability of location data... is also unreliable
''There are known knowns.. we also know there are known unknowns..
but there are also unknown unknowns'' – Donald Rumsfeld
Text mentions require disambiguation
● In profile
● In messages
● In queries
Requirement is to rank vague points given vague query
17. Willingness to travel
Determines useful search radius
Based on mode of transport:
14.9km
22.0km
40.6km
61.5km
>100km
Different for varying classes of Point Of Interest?
ST Social media = huge dataset
Easy data collection
Useful for e.g. town planning
18. Spatio-temporal Challenges
We've seen temporal and spatial challenges; let's combine!
Given all these spatio-temporal utterances, what can we do?
- Spatial gives relevance from physical or travel proximity
- Temporal gives relevance from recency and historical
Adding text to the spatio-temporal points gives
explicit semantic context
Not only are ST patterns in the data, we are told what they mean!
19. Topic-based Retrieval
Retrieving results on a topic is useful; ''Tell me about X''
Specific terms vary between places and over time
2007 England English
en.wikipedia.org/wiki/President_of_the_United_States ''Jelly''
2011 US English
… Spatio-temporally sensitive indexing?
20. Sentiment Monitoring
Measure how attitudes change over time and over location
Business uses: where to send marketing
Political uses: data-driven democratic.. campaigning
Governance uses: what are citizen priorities in a region
Temporal dimension enables tracking of trends and reactions
red = upbeat;
blue = complaint.
- no normalisation for vocality!
21. Local Computational Journalism
Social media is quick
Social media is uncurated
''Citizen Journalism''
News has relevance scope:
Recency
Proximity
Different events relevant in different contexts:
Rain in London
Rain in Addis Ababa
Automatic event detection5 - and also reporting!
5. Ritter at al. 2012: 'Open domain event extraction from Twitter'', Proc. ACM SIGKDD
22. Summary
Social media is a rich source of ''big data''
A small sampling of all human discourse
It comes with temporal and spatial context
Context-aware search and analysis is very demanding!
- Novel, powerful applications
- Wide variety of domains
- An open set of challenges