SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Semantic Approach to
Big Data and Event Processing
Integrating Sensor and Social Data
for Understanding City Events
Pramod Anantharam
Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)
Wright State University, USA
Tutorial @ Kno.e.sis Centre: Semantics Approach to Big Data and Event Processing, Oct 7-9, 2015
Slow moving
traffic
Link
Description
Scheduled
Event
Scheduled
Event
511.org
511.org
Schedule Information
511.org
2
3
• Why?
– Provides Complementary information for
comprehensive situational awareness
• Sensor : Social :: Quantitative vs Qualitative
– Corroboration can further improve trustworthiness
• What?
– Collect and relate multimodal sensors data and social
media data
• How?
– Correlate heterogeneous data streams exploiting
spatio-temporal proximity and domain knowledge
T. K. Prasad 4
Multimodal Data Integration
• Why?
– Explain/Interpret average speed and link travel time
data using event schedule provided by city authorities
and real-time traffic events shared on Twitter
– Past work: Predict congestion based on historical
sensor data
• What?
– Combine
• 511.org data about Bay Area Road Network Traffic
– E.g., Average speed and link travel time data stream
– E.g., (Happened or planned) event reports
• Tweets that report events including ad hoc ones
T. K. Prasad 5
Traffic Domain Use Case (open data)
• How?
– Extract events from textual tweets stream
– Build statistical models of normalcy, and thereby
anomaly, from numerical sensor data streams
– Correlate multimodal streams, using spatio-
temporal information, to annotate “anomalies” in
sensor data time series with textual events
T. K. Prasad 6
Traffic Domain Use Case (open data)
• How?
– Extract events from textual tweets stream
– Build statistical models of normalcy, and thereby
anomaly, from numerical sensor data streams
– Correlate multimodal streams, using spatio-
temporal information, to annotate “anomalies” in
sensor data time series with textual events
T. K. Prasad 7
Traffic Domain Use Case (open data)
8
Various City Events Reported on Twitter
Some Challenges in Extracting Events from Tweets
• No well accepted definition of ‘events related to a
city’
• Tweets are short (140 characters) and its informal
nature make it hard to analyze
– Entity, location, time, and type of the event
• Multiple reports of the same event and sparse
report of some events (biased sample)
– Numbers don’t necessarily indicate intensity
• Validation of the solution is hard due to the open
domain nature of the problem
9
Formal Text Informal Text
Closed Domain
Open Domain [Roitman et al. 2012][Kumaran and Allan 2004]
[Lampos and Cristianini 2012]
[Becker et al. 2011]
[Wang et al. 2012]
[Ritter et al. 2012]
Related Work on Event Extraction
10
11
[ABTA-14] Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, and Amit Sheth. 2015. Extracting City Traffic Events from Social Streams.
ACM Trans. Intell. Syst. Technol. 6, 4, Article 43 (July 2015), 27 pages. DOI=10.1145/2717317 http://doi.acm.org/10.1145/2717317
City Event Extraction from Textual Data
• City Event Annotation
– Automated creation of training data
– Annotation task (our CRF model vs. baseline CRF
model)
• City Event Extraction
– Use aggregation algorithm for event extraction
– Extracted events AND ground truth
• Dataset (Aug – Nov 2013) ~ 8 GB of data on disk
– Over 8 million tweets
– Over 162 million sensor data points
– 311 active events and 170 scheduled events
Evaluation
12
13
Evaluation Metric For Comparing Events with Ground Truth:
• Complementary Events
• Additional information e.g., slow traffic from sensor data and accident from
textual data
• Corroborative Events
• Additional confidence e.g., accident event supporting a accident report from
ground truth
• Timeliness
• Early detection e.g., knowing poor visibility before its formal report
Distribution of Extracted Events Over Locations
14
Complementary Events
Complementary Events
Complementary Events
15Corroborative Events
Corroborative Events
Corroborative Events
16
Timeliness
Timeliness
Evaluating Timeliness
• How?
– Extract events from textual tweets stream
– Build statistical models of normalcy, and thereby
anomaly, from numerical sensor data streams
– Correlate multimodal streams, using spatio-
temporal information, to annotate “anomalies” in
sensor data time series with textual events
T. K. Prasad 17
Traffic Domain Use Case (open data)
Image credit: http://traffic.511.org/index
Multiple events
Varying influence
interact with each other
Focus of this talk: algorithms to understand these manifestations
18
Correlating Multimodal Streams: Preliminary Insights
• Causes of non-linearity in sensor data
streams
– Temporal landmarks : peak hour vs off-peak traffic
vs weekend traffic
– Effect of location
– Scheduled events such as road construction,
baseball game, or music concert
– Unexpected events such as accidents or heavy
rains
– Random variations (viz., stochasticity)
T. K. Prasad 19
Traffic Dependencies
• Disclaimer
"All models are wrong, but some are useful.” - George Box
• Normalcy Model
– Gaussian Mixture Model (GMM)
• Captures multiple co-existing events and its impact on traffic
– Auto Regressive (AR) Models
• Captures temporal dependencies in traffic dynamics
– Restricted Switching Linear Dynamical System
• Exploits Domain Common Sense for Stationarity
• One LDS model per road link per week hour (24 hr x 7 days / week
=> 168 models)
• Anomaly Model
– Cf. Box and Whisker plots
T. K. Prasad 20
Abstracting Traffic Behavior: Traffic Data Model
Image credit: http://tourontap.com/us-open-2012/courses-and-more-by-the-bay/
AT&T Park
21
Histogram of speed values
collected from June 1st 12:00 AM to June 2nd 12:00 AM
Histogram of travel time values
collected from June 1st 12:00 AM to June 2nd 12:00 AM
22
Traffic Data: First Peek
Most of the drivers tend to
go 5 km/h over the posted speed limit
There are relatively less drivers who
go more than 10 km/h over the
posted speed limit
There are situations in a day where the
drivers are going (forced) below the
speed limit e.g., rush hour traffic
Do these histograms resemble any probability distribution?
23
Traffic Data: Possible Explanation
“many variables such as height, weight, IQ scores, reading ability, job satisfaction,
blood pressure turn out to have distributions that are bell-shaped or normal.”2
Popularized by Gauss in 1809 while he used it for analyzing astronomical data and hence now
popularly known as the Gaussian Distribution.
http://en.wikipedia.org/wiki/Normal_distribution
2http://peoplelearn.homestead.com/Topic3NORMAL1.html
P(x) = G(μ, σ2)
24
Gaussian Distribution
25
Multiple Gaussian Distributions: A Better Fit for Speed Observations?
This distribution resembles a
Gaussian Mixture Model (GMM)
Assume Normalcy to be uninterrupted traffic flow
July 2014 has no events so, we
hypothesize higher log-likelihood
score
June 2014 has many events so, we
hypothesize lower log-likelihood
score
-115655.8
-125974.3
26
Golden Gate Fields: Comparing Months with Varying Event Occurrences
27
Hourly Traffic Dynamics Over a Day
• Differentiate various traffic dynamics
– Gaussian mixture model is too course grained as it does not discriminate
between increasing traffic over an hour from decreasing traffic over the
same hour.
• Account for unobserved factors
– Autoregressive models cannot capture unobserved factors
• E.g., Traffic volume, which may be unobserved dictates the manifestation of events in link
speed and travel time variations.
– Linear Dynamical System introduces latent state-based model
• E.g., Traffic volume (low vs high), road lane closures, and weather conditions (visibility) can
impact how observations evolve.
• Emission/Transition matrix and Gaussian noise captures stochasticity.
T. K. Prasad 28
Modeling Traffic Dynamics: Statistical Models and Intuitions
• Characterize data time series (by learning
distribution of each time point behavior using
mean and variance)
• Pick a realizable mediod time series as prototype
for comparison summarized using LDS parameters
29
Linear Dynamical System Model
30
Learning LDS Models
31
Tagging Anomalies with LDS Models
• Normalcy : Log Likelihood scores of traces from event free data visualized
as box and whiskers plot
– Intertwined with long-term construction event influence
• Anomaly : Log Likelihood score falls beyond whiskers threshold for
eventful data
T. K. Prasad 32
Log-likelihood
score
Tagging Anomalies: Intuitions
• How?
– Extract events from textual tweets stream
– Build statistical models of normalcy, and thereby
anomaly, from numerical sensor data streams
– Correlate multimodal streams, using spatio-
temporal information, to annotate “anomalies” in
sensor data time series with textual events
T. K. Prasad 33
Traffic Domain Use Case (open data)
• If an anomaly is detected on a link L and during time
period [tst, tet], then the anomaly is explained by an
event if the event occurred in the vicinity within
0.5km radius and during [tst-1, tet+1].
• CAVEAT: An anomaly may not be explained because
of missing data.
T. K. Prasad 34
Spatio-temporal co-occurrence criteria
• Data collected from San Francisco Bay Area between May 2014 to May
2015
– 511.org:
• 1,638 traffic incident reports
• 1.4 billion speed and travel time observations
– Twitter Data: 39,208 traffic related incidents extracted from over 20 million
tweets1
• Naïve implementation for learning normalcy models for 2,534 links
resulted in 40 minutes per link (~ 2 months of processing time for our
data)
– 2.66 GHz, Intel Core 2 Duo with 8 GB main memory
• Scalable implementation by exploiting the nature of the problem resulted
in learning normalcy models within 24 hours
– The Apache Spark cluster used in our evaluation has 864 cores and 17TB main
memory.
35
1Anantharam, P. 2014. Extracting city traffic events from so- cial streams. https://osf.io/b4q2t/wiki/home/
Experimental Data Statistics And Infrastructure
36
Evaluation Results
Semantic Approach to
Big Data and Event Processing
Thank you!
Any Question?

Weitere ähnliche Inhalte

Andere mochten auch

Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Artificial Intelligence Institute at UofSC
 
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...Artificial Intelligence Institute at UofSC
 
Semantics Approach to Big Data and Event Processing: an introduction focused ...
Semantics Approach to Big Data and Event Processing: an introduction focused ...Semantics Approach to Big Data and Event Processing: an introduction focused ...
Semantics Approach to Big Data and Event Processing: an introduction focused ...Artificial Intelligence Institute at UofSC
 
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...Artificial Intelligence Institute at UofSC
 
Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...Artificial Intelligence Institute at UofSC
 
Knoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-TutorialsKnoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-TutorialsPavan Kapanipathi
 

Andere mochten auch (10)

Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
 
Examples of Applied Semantic Technologies: Social Data Annotation
Examples of Applied Semantic Technologies:  Social Data AnnotationExamples of Applied Semantic Technologies:  Social Data Annotation
Examples of Applied Semantic Technologies: Social Data Annotation
 
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
 
Semantics Approach to Big Data and Event Processing: an introduction focused ...
Semantics Approach to Big Data and Event Processing: an introduction focused ...Semantics Approach to Big Data and Event Processing: an introduction focused ...
Semantics Approach to Big Data and Event Processing: an introduction focused ...
 
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
 
Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...
 
Mastering the Velocity Dimension of Big Data
Mastering the Velocity Dimension of Big DataMastering the Velocity Dimension of Big Data
Mastering the Velocity Dimension of Big Data
 
Examples of Real-World Big Data Application
Examples of Real-World Big Data ApplicationExamples of Real-World Big Data Application
Examples of Real-World Big Data Application
 
Knoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-TutorialsKnoesis-Semantic filtering-Tutorials
Knoesis-Semantic filtering-Tutorials
 
RDF Streams and Continuous SPARQL (C-SPARQL)
RDF Streams and Continuous SPARQL (C-SPARQL)RDF Streams and Continuous SPARQL (C-SPARQL)
RDF Streams and Continuous SPARQL (C-SPARQL)
 

Ähnlich wie Integrating Sensor and Social Data for Understanding City Events

Extracting City Traffic Events from Social Streams
 Extracting City Traffic Events from Social Streams Extracting City Traffic Events from Social Streams
Extracting City Traffic Events from Social StreamsPramod Anantharam
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City ApplicationsAmit Sheth
 
Certain Analysis on Traffic Dataset based on Data Mining Algorithms
Certain Analysis on Traffic Dataset based on Data Mining AlgorithmsCertain Analysis on Traffic Dataset based on Data Mining Algorithms
Certain Analysis on Traffic Dataset based on Data Mining AlgorithmsIRJET Journal
 
Using topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataUsing topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataivaderivader
 
Get Started with Data Science by Analyzing Traffic Data from California Highways
Get Started with Data Science by Analyzing Traffic Data from California HighwaysGet Started with Data Science by Analyzing Traffic Data from California Highways
Get Started with Data Science by Analyzing Traffic Data from California HighwaysAerospike, Inc.
 
Mythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event ProcessingMythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event ProcessingTim Bass
 
A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008Emanuele Della Valle
 
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingA Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingPayamBarnaghi
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysYork University
 
URBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJORURBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJORBig Data Week
 
Density of route frequency for enforcement
Density of route frequency for enforcement Density of route frequency for enforcement
Density of route frequency for enforcement Conference Papers
 
[Seminar] 20210115 Hyeshin Chu
[Seminar] 20210115 Hyeshin Chu[Seminar] 20210115 Hyeshin Chu
[Seminar] 20210115 Hyeshin Chuivaderivader
 
Understanding Human Mobility
Understanding Human MobilityUnderstanding Human Mobility
Understanding Human MobilityWidy Widyawan
 
The Design of a Simulation for the Modeling and Analysis of Public Transporta...
The Design of a Simulation for the Modeling and Analysis of Public Transporta...The Design of a Simulation for the Modeling and Analysis of Public Transporta...
The Design of a Simulation for the Modeling and Analysis of Public Transporta...CSCJournals
 
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...Azavea
 

Ähnlich wie Integrating Sensor and Social Data for Understanding City Events (20)

Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual ObservationsUnderstanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
 
Extracting City Traffic Events from Social Streams
 Extracting City Traffic Events from Social Streams Extracting City Traffic Events from Social Streams
Extracting City Traffic Events from Social Streams
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
 
Certain Analysis on Traffic Dataset based on Data Mining Algorithms
Certain Analysis on Traffic Dataset based on Data Mining AlgorithmsCertain Analysis on Traffic Dataset based on Data Mining Algorithms
Certain Analysis on Traffic Dataset based on Data Mining Algorithms
 
Using topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataUsing topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban data
 
Get Started with Data Science by Analyzing Traffic Data from California Highways
Get Started with Data Science by Analyzing Traffic Data from California HighwaysGet Started with Data Science by Analyzing Traffic Data from California Highways
Get Started with Data Science by Analyzing Traffic Data from California Highways
 
Role of Big Data for Smart City Applications
Role of Big Data for Smart City ApplicationsRole of Big Data for Smart City Applications
Role of Big Data for Smart City Applications
 
201029 Joohee Kim
201029 Joohee Kim201029 Joohee Kim
201029 Joohee Kim
 
Mythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event ProcessingMythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event Processing
 
A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008
 
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingA Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in Highways
 
Big Data Challenges and Trust Management at CTS -2016
Big Data Challenges and Trust Management at CTS -2016Big Data Challenges and Trust Management at CTS -2016
Big Data Challenges and Trust Management at CTS -2016
 
URBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJORURBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJOR
 
EventShop ISG talk 140213
EventShop ISG talk 140213EventShop ISG talk 140213
EventShop ISG talk 140213
 
Density of route frequency for enforcement
Density of route frequency for enforcement Density of route frequency for enforcement
Density of route frequency for enforcement
 
[Seminar] 20210115 Hyeshin Chu
[Seminar] 20210115 Hyeshin Chu[Seminar] 20210115 Hyeshin Chu
[Seminar] 20210115 Hyeshin Chu
 
Understanding Human Mobility
Understanding Human MobilityUnderstanding Human Mobility
Understanding Human Mobility
 
The Design of a Simulation for the Modeling and Analysis of Public Transporta...
The Design of a Simulation for the Modeling and Analysis of Public Transporta...The Design of a Simulation for the Modeling and Analysis of Public Transporta...
The Design of a Simulation for the Modeling and Analysis of Public Transporta...
 
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
 

Kürzlich hochgeladen

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Kürzlich hochgeladen (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Integrating Sensor and Social Data for Understanding City Events

  • 1. Semantic Approach to Big Data and Event Processing Integrating Sensor and Social Data for Understanding City Events Pramod Anantharam Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State University, USA Tutorial @ Kno.e.sis Centre: Semantics Approach to Big Data and Event Processing, Oct 7-9, 2015
  • 3. 3
  • 4. • Why? – Provides Complementary information for comprehensive situational awareness • Sensor : Social :: Quantitative vs Qualitative – Corroboration can further improve trustworthiness • What? – Collect and relate multimodal sensors data and social media data • How? – Correlate heterogeneous data streams exploiting spatio-temporal proximity and domain knowledge T. K. Prasad 4 Multimodal Data Integration
  • 5. • Why? – Explain/Interpret average speed and link travel time data using event schedule provided by city authorities and real-time traffic events shared on Twitter – Past work: Predict congestion based on historical sensor data • What? – Combine • 511.org data about Bay Area Road Network Traffic – E.g., Average speed and link travel time data stream – E.g., (Happened or planned) event reports • Tweets that report events including ad hoc ones T. K. Prasad 5 Traffic Domain Use Case (open data)
  • 6. • How? – Extract events from textual tweets stream – Build statistical models of normalcy, and thereby anomaly, from numerical sensor data streams – Correlate multimodal streams, using spatio- temporal information, to annotate “anomalies” in sensor data time series with textual events T. K. Prasad 6 Traffic Domain Use Case (open data)
  • 7. • How? – Extract events from textual tweets stream – Build statistical models of normalcy, and thereby anomaly, from numerical sensor data streams – Correlate multimodal streams, using spatio- temporal information, to annotate “anomalies” in sensor data time series with textual events T. K. Prasad 7 Traffic Domain Use Case (open data)
  • 8. 8 Various City Events Reported on Twitter
  • 9. Some Challenges in Extracting Events from Tweets • No well accepted definition of ‘events related to a city’ • Tweets are short (140 characters) and its informal nature make it hard to analyze – Entity, location, time, and type of the event • Multiple reports of the same event and sparse report of some events (biased sample) – Numbers don’t necessarily indicate intensity • Validation of the solution is hard due to the open domain nature of the problem 9
  • 10. Formal Text Informal Text Closed Domain Open Domain [Roitman et al. 2012][Kumaran and Allan 2004] [Lampos and Cristianini 2012] [Becker et al. 2011] [Wang et al. 2012] [Ritter et al. 2012] Related Work on Event Extraction 10
  • 11. 11 [ABTA-14] Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, and Amit Sheth. 2015. Extracting City Traffic Events from Social Streams. ACM Trans. Intell. Syst. Technol. 6, 4, Article 43 (July 2015), 27 pages. DOI=10.1145/2717317 http://doi.acm.org/10.1145/2717317 City Event Extraction from Textual Data
  • 12. • City Event Annotation – Automated creation of training data – Annotation task (our CRF model vs. baseline CRF model) • City Event Extraction – Use aggregation algorithm for event extraction – Extracted events AND ground truth • Dataset (Aug – Nov 2013) ~ 8 GB of data on disk – Over 8 million tweets – Over 162 million sensor data points – 311 active events and 170 scheduled events Evaluation 12
  • 13. 13 Evaluation Metric For Comparing Events with Ground Truth: • Complementary Events • Additional information e.g., slow traffic from sensor data and accident from textual data • Corroborative Events • Additional confidence e.g., accident event supporting a accident report from ground truth • Timeliness • Early detection e.g., knowing poor visibility before its formal report Distribution of Extracted Events Over Locations
  • 17. • How? – Extract events from textual tweets stream – Build statistical models of normalcy, and thereby anomaly, from numerical sensor data streams – Correlate multimodal streams, using spatio- temporal information, to annotate “anomalies” in sensor data time series with textual events T. K. Prasad 17 Traffic Domain Use Case (open data)
  • 18. Image credit: http://traffic.511.org/index Multiple events Varying influence interact with each other Focus of this talk: algorithms to understand these manifestations 18 Correlating Multimodal Streams: Preliminary Insights
  • 19. • Causes of non-linearity in sensor data streams – Temporal landmarks : peak hour vs off-peak traffic vs weekend traffic – Effect of location – Scheduled events such as road construction, baseball game, or music concert – Unexpected events such as accidents or heavy rains – Random variations (viz., stochasticity) T. K. Prasad 19 Traffic Dependencies
  • 20. • Disclaimer "All models are wrong, but some are useful.” - George Box • Normalcy Model – Gaussian Mixture Model (GMM) • Captures multiple co-existing events and its impact on traffic – Auto Regressive (AR) Models • Captures temporal dependencies in traffic dynamics – Restricted Switching Linear Dynamical System • Exploits Domain Common Sense for Stationarity • One LDS model per road link per week hour (24 hr x 7 days / week => 168 models) • Anomaly Model – Cf. Box and Whisker plots T. K. Prasad 20 Abstracting Traffic Behavior: Traffic Data Model
  • 22. Histogram of speed values collected from June 1st 12:00 AM to June 2nd 12:00 AM Histogram of travel time values collected from June 1st 12:00 AM to June 2nd 12:00 AM 22 Traffic Data: First Peek
  • 23. Most of the drivers tend to go 5 km/h over the posted speed limit There are relatively less drivers who go more than 10 km/h over the posted speed limit There are situations in a day where the drivers are going (forced) below the speed limit e.g., rush hour traffic Do these histograms resemble any probability distribution? 23 Traffic Data: Possible Explanation
  • 24. “many variables such as height, weight, IQ scores, reading ability, job satisfaction, blood pressure turn out to have distributions that are bell-shaped or normal.”2 Popularized by Gauss in 1809 while he used it for analyzing astronomical data and hence now popularly known as the Gaussian Distribution. http://en.wikipedia.org/wiki/Normal_distribution 2http://peoplelearn.homestead.com/Topic3NORMAL1.html P(x) = G(μ, σ2) 24 Gaussian Distribution
  • 25. 25 Multiple Gaussian Distributions: A Better Fit for Speed Observations? This distribution resembles a Gaussian Mixture Model (GMM)
  • 26. Assume Normalcy to be uninterrupted traffic flow July 2014 has no events so, we hypothesize higher log-likelihood score June 2014 has many events so, we hypothesize lower log-likelihood score -115655.8 -125974.3 26 Golden Gate Fields: Comparing Months with Varying Event Occurrences
  • 28. • Differentiate various traffic dynamics – Gaussian mixture model is too course grained as it does not discriminate between increasing traffic over an hour from decreasing traffic over the same hour. • Account for unobserved factors – Autoregressive models cannot capture unobserved factors • E.g., Traffic volume, which may be unobserved dictates the manifestation of events in link speed and travel time variations. – Linear Dynamical System introduces latent state-based model • E.g., Traffic volume (low vs high), road lane closures, and weather conditions (visibility) can impact how observations evolve. • Emission/Transition matrix and Gaussian noise captures stochasticity. T. K. Prasad 28 Modeling Traffic Dynamics: Statistical Models and Intuitions
  • 29. • Characterize data time series (by learning distribution of each time point behavior using mean and variance) • Pick a realizable mediod time series as prototype for comparison summarized using LDS parameters 29 Linear Dynamical System Model
  • 32. • Normalcy : Log Likelihood scores of traces from event free data visualized as box and whiskers plot – Intertwined with long-term construction event influence • Anomaly : Log Likelihood score falls beyond whiskers threshold for eventful data T. K. Prasad 32 Log-likelihood score Tagging Anomalies: Intuitions
  • 33. • How? – Extract events from textual tweets stream – Build statistical models of normalcy, and thereby anomaly, from numerical sensor data streams – Correlate multimodal streams, using spatio- temporal information, to annotate “anomalies” in sensor data time series with textual events T. K. Prasad 33 Traffic Domain Use Case (open data)
  • 34. • If an anomaly is detected on a link L and during time period [tst, tet], then the anomaly is explained by an event if the event occurred in the vicinity within 0.5km radius and during [tst-1, tet+1]. • CAVEAT: An anomaly may not be explained because of missing data. T. K. Prasad 34 Spatio-temporal co-occurrence criteria
  • 35. • Data collected from San Francisco Bay Area between May 2014 to May 2015 – 511.org: • 1,638 traffic incident reports • 1.4 billion speed and travel time observations – Twitter Data: 39,208 traffic related incidents extracted from over 20 million tweets1 • Naïve implementation for learning normalcy models for 2,534 links resulted in 40 minutes per link (~ 2 months of processing time for our data) – 2.66 GHz, Intel Core 2 Duo with 8 GB main memory • Scalable implementation by exploiting the nature of the problem resulted in learning normalcy models within 24 hours – The Apache Spark cluster used in our evaluation has 864 cores and 17TB main memory. 35 1Anantharam, P. 2014. Extracting city traffic events from so- cial streams. https://osf.io/b4q2t/wiki/home/ Experimental Data Statistics And Infrastructure
  • 37. Semantic Approach to Big Data and Event Processing Thank you! Any Question?

Hinweis der Redaktion

  1. Point of this slide: correlations
  2. Point of this slide: heterogeneity and uncertainty
  3. Improve coverage Past work for comparison: variation in bike hires based on events in the city (e.g., parades, sports, bad weather)
  4. Annotate sensor data stream with tweets : timelines / Google trends
  5. Annotate sensor data stream with tweets : timelines / Google trends
  6. [Kumaran and Allan 2004] Giridhar Kumaran and James Allan. 2004. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 297–304. [Lampos and Cristianini 2012] Vasileios Lampos and Nello Cristianini. 2012. Nowcasting events from the social web with statistical learn- ing. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 4 (2012), 72. [Roitman et al. 2012] Haggai Roitman, Jonathan Mamou, Sameep Mehta, Aharon Satt, and LV Subramaniam. 2012. Harnessing the Crowds for smart city sensing. In Proceedings of the 1st international workshop on Multimodal crowd sensing. ACM, 17–18. [Ritter et al. 2012] Alan Ritter, Oren Etzioni, Sam Clark, and others. 2012. Open domain event extraction from twitter. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1104–1112. [Wang et al. 2012] Xiaofeng Wang, Matthew S Gerber, and Donald E Brown. 2012. Automatic crime prediction using events extracted from twitter posts. In Social Computing, Behavioral-Cultural Modeling and Prediction. Springer, 231–238. [Becker et al. 2011] Hila Becker, Mor Naaman, and Luis Gravano. 2011. Beyond Trending Topics: Real-World Event Identification on Twitter.. In ICWSM.
  7. Annotate sensor data stream with tweets : timelines / Google trends
  8. Annotate sensor data stream with tweets : timelines / Google trends
  9. We modify SLDS to RSLDS by avoiding Markovian dependence among switches. In reality, the switches are temporally related but we decouple them for simplicity because we know the time.
  10. 1733 – DeMoivre developed the normal curve mathematically as a binomial distribution approx. 1783 – Laplace used normal curves to describe distribution of errors PDF = 1/σ√(2π) * e^(- (x - μ)^2 / 2*σ^2)
  11. Markovian: : Current observation depends on previous observation. If we consider different samples as IID and summarize variations using a single Guassian distribution, we may miss time-based behavioral changes. Output “linearly or as Guassian” follows the state, but state change can be non-linear Or reset periodically. Should number of additional latent states (beyond the observables) in traffic case be the same as the other influencers? Low volume : traffic speeds and link transit times can vary immensely … High volume : traffic speeds and link transit times should saturate … --------- Latent variables are abduced from sample observations and then used for predictions … --------- HOV
  12. Markovian: : Current observation depends on previous observation. Do the states remember the previous time’s value? Why time series trajectory?: If we consider different samples as IID and summarize variations using a single Guassian distribution, we may miss time-based behavioral changes. E.g., an increasing and a decreasing time series may give the semblance of constancy
  13. Boxplot of log-likelihood scores accumulated for each hour of day
  14. Annotate sensor data stream with tweets : timelines / Google trends
  15. 511.org data Event data Apache SPARK Reimplementation