Realtime Big Data Analytics for Event Detection in Highways
1. Realtime Big Data Analytics for
Event Detection in Highways
Hamzeh Khazaei, Rodrigo Veleda, Ali Tizghadam and
Marin Litoiu
IEEE World Forum on Internet of Things, Reston, VA, USA Dec12-14, 2016
2. Agenda:
• Introducing CVST Platform
Connected Vehicle Smart Transportation
• This paper:
• Event Detection/Classification in realtime
2
3. Issues in ITS Systems
3
Existing issues
Traditional transportation issues
(grid lock, freeway congestion,
transit delay, safety, etc)
Computational issues
(processing time, processing
power)
Data management issues (data-
in-rest )
How to address the issues?
More intelligence
Enhanced computing (Cloud)
Enhanced data-in-motion analysis
(streaming analytics, Hierarchy of
tasks, Pub-Sub technique)
Promote safety :(Autonomous (intra-
vehicle), Cooperative (inter-vehicle)
4. CVST High Level Objectives
4
4
Open Flexible Platform
Leveraging Future Wireless and Cloud
Technologies
Smart Management Architecture
Improving Efficiency and Safety
Novel Applications
Private Sector Using CVST Platform
Connected Vehicles & Mobile Computing
Cloud
Data Management & Event Processing
Autonomic Control of Transportation Systems
Smart Transportation Management Systems
Road /
Vehicle Data
(V2R)
Vehicle to
Infrastructure
data (V2I)
Vehicle to Vehicle data
(V2V)
Smart Transportation Applications
Ontario Research Project: 2011-2015
University of Toronto & York University
Industry Partners
Multidisciplinary
5. Hierarchy of Needs
5
• Transportation applications
• Recommendations (Routing, City planning)
• What-If Analysis, Impact analysis
Applications
• Real-time streaming analytics
• OD Demands
• Trending / Forecasting
• KPI analysis
Intelligent
Transportation
Management
(Data Process)
• Publish / subscribe
• Data collection
• Data Integrity, cleansing
• Data anonymity
Data Dissemination
(Data Management)
• Sensors
• V2V, V2R, V2I
• Cloud resource management
• Horizontal / vertical scaling (SAVI)
Connected Vehicles & Mobile
Computing Cloud
(Resource Management)
Wisdom
Understanding
Knowledge
Information
Data
7. CVST: Functional View
7
Data Dissemination (over ICN)
Database Subscriber
(Raw Data)
Subscribe
Report Engine
API
Alg. Engine
Congestion
Pricing
Routing
Portal
DataFormat
Data
Validation
Data
Cleansing
No
Yes
PublisherClient
Subscriber Client
DataAnonymization
Applications
Analytics
Engine
portal.cvst.ca
Simulation
Data Dissemination layer is consistent with GB979
Pub/Sub
over ICN
8. CVST
Platform:
Aavailable
data
8
Data Source Data Format Data Type
Loop Detector Sensor Structured Numerical
Traffic Cameras Un-structured Images, Videos
Mobile Devices
(GPS/Bluetooth)
Structured Numerical
Toronto Traffic Survey Structured Text/Numerical
Incident Report Structured Text/Numerical
Public Transportation Structured Numerical
Media Outlets Semi-structured Text/Numerical
Social Media Unstructured Text
Tomtom data Structured Numerical
Weather information Structured Numerical
Drone Camera Unstructured Images, Videos
BIXI Structured Numerical
Border Structured Numerical
Air Sensors Structured Numerical
10. In this paper:
• We propose an autonomic analytic platform to perform analysis on
CVST data for detection and classification of events in realtime.
• The platform consists of data, analytics and management
components.
• The platform is cluster-based and leverages the cloud to achieve
reliability, scalability and adaptability.
• It can be applied to both realtime and retrospective analysis.
• We validated it for detecting events in realtime in the major
highways in Greater Toronto Area (GTA).
10
11. What is data analytics?
• Data Analytics is the process of extraction, loading,
cleaning, transformation, and modeling of data in order
to gain insights for informed decision making.
11
12. Type of Analytics
• Descriptive: “what happened
and/or what is happening”.
• Predictive: “what will happen
and/or why will it happen?”
• Prescriptive: “what will
happen if I change this”
• Discovery: explore new
findings previously unknown
12
13. Trends of Analytics
• Data analytics is moving from batch
to real-time.
• Analytics-as-a-service models are
emerging.
• Instrumentation and then
exploitation of data via Analytics.
13
All these data are now available through CVST portal. Tomtom data, Air Sensors and weather information are new data.
Demo of CVST at the end of presentation.
We used K-Feed (in-house development at York) to provide auto-scalability for our analytic platform.
We designed a Hierarchical data management layer to accommodate various data and processing needs for different stakeholders.
The analytic component is an elastic map-reduce platform that is built upon Sahara project from OpenStack platform. We deployed our solution on SAVI which is an Open Stack based academic cloud in Canada. As can be seen we modified Sahara component to be autonomously scalable. (ie, providing elastic big data analytics). All CVST platform is also has been deployed on SAVI.
The instance of the analytic engine that we use for this paper. We deployed a stand alone Spark Cluster. We leveraged Big Queue as the IoT gateway (to abstract the sensors) and HDFS as the
Data storage for outputs. For this paper, we did stream processing in Spark. For demo (see last slide) we read historical data (July 2015) and treat it as stream data.
The constant values (eg, 85, 150, 30, 84 etc.) can be adjusted based on the roads and other conditions. Algorithm 1 creates a window of ten consecutive speed reports. Then it check if these values follow an event pattern. If so then the length of the event will be calculated using Algorithm 2. Based on the length the event will be classified in Algorithm 3.
This graph shows how we implemented algorithms in realtime. Each blue hexagon shows one of the “if” in the algorithm 1.
We show each type of event with different color. If you click on each event you see the data, time and the location of the event.
Sipresk platform demo has access to the data for July 2015.