- Denser traffic on Freeways 101, 405, 10
- Rush hours from 7 am to 9 am produce a lot of traffic, the heaviest traffic time start from 3pm and gets better after 6pm.
- Major areas of traffic in DTLA, Santa Monica, Hollywood
- More insights can be found with bigger dataset using this framework for analysis of traffic
- Using such data and platform can also give an opportunity to predict traffic congestions. Prediction can be performed using machine learning algorithm – Decision Forest with the accuracy of 83% for predicting the heaviest traffic jam.
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Traffic Prediction Using Big Data Analytics
1. Jongwook Woo
HiPIC
CalStateLA
KSII The 14th Asia Pacific International Conference
on Information Science and Technology(APIC-IST),
Beijing
June 24 2019
Dalya (Dalyapraz) Dauletbak, dmanato@calstatela.edu
Jongwook Woo, PhD
Big Data AI Center (BigDAI)
California State University Los Angeles
Traffic Data Analysis and Prediction
using Big Data
2. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Introduction
H/W Specification
Architecture Chart
Implementation steps
Data structure
Analysis
Prediction
Summary
3. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Introduction
About me:
Graduate Computer Information Systems Student at California State University, Los Angeles
– BS (2015): Mathematics at Nazarbayev University
– Previously: Senior Consultant/Data Analyst @ Management consulting at KPMG Central Asia
– Current: Community Manager @ International Data Engineering and Science Association (IDEAS)
Data source:
A GPS navigation mobile application
Provide real-time directions and up-to-date information
Traffic
Accidents
Road closure
Weather hazards
Lurking police vehicles and etc.
4. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Introduction
Data source:
Navigation app traffic data set from LA City Department*
Information reported by users - Alerts
information captured by user’s device - Jams
We are going to find out:
Areas with high volume of traffic (geography)
Peak-hours
Density of Alerts and Incidents
Traffic volume by road types
Prediction of traffic jam
*Limited authorization to access the full datasets 100 GB + original; we used
limited dataset to 9 days (Dec 31– Jan 8, 2018) ~2GB
5. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Introduction
H/W Specification
Architecture Chart
Implementation steps
Data structure
Analysis
Prediction
Summary
6. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
H/W Specification
Number of nodes 6
OCPUs 12
CPU speed 2195.196MHz
Memory 180 GB
Storage 682 GB
7. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Architecture Chart
Source: Hadoop Masterclass
Part 4 of 4: Analyzing Big Data
Lars George | EMEA Chief Architect
Cloudera
8. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Implementation steps
Local Computer
Raw data
files
(JSON)
Geo-Spatial
Visualization (3D
map)
Dashboard for
Analytics
Hadoop/Hive
Upload dataset to
HDFS
Parse JSON files
using Pandas
Create tables’
schema
Clean data
Create sample/summary
dataset for prediction and
visualization
Microsoft Azure
ML Studio
Upload sample
dataset
Apply data
transformation
Split dataset for
training and scoring
Train model(s)
Evaluate model(s)
9. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data structure
10. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Introduction
H/W Specification
Architecture Chart
Implementation steps
Data structure
Analysis
Prediction
Summary
11. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Analysis
Information we are using:
Location/Time
Level of traffic intensity
X and Y coordinates (Longitude & Latitude)
Counts of jams/alerts
Tools we are using:
Excel - 3D map
Power BI - Flow map, pie charts, bar charts
What we are predicting:
Level of traffic (1 to 3 – light, medium, heavy)
Based on date, time, location
12. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Traffic in LA (captured from users' devices)
13. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Traffic in LA (reported by app users)
14. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Video-Simulation of Traffic in LA (captured from users' devices)
15. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Video-Simulation of Traffic in LA (reported by app users)
16. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Traffic Analysis Dashboard
Peak
Peak
17. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Traffic Analysis Dashboard
Major areas of traffic are:
Downtown Los Angeles,
Santa Monica, Hollywood,
and highways.
18. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Introduction
H/W Specification
Architecture Chart
Implementation steps
Data structure
Analysis
Prediction
Summary
19. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Prediction of traffic congestion with Machine Learning
Data
preparation
Group label values
Join additional
dataset
Apply data
transformation
Normalize data
Model building
Model(s) selection
Cross Validation
Train model(s)
Model
evaluation
Score model
Evaluate model
(Accuracy, Recall)
20. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Features/columns in a dataset
location x,
location y
X and Y -coordinate of location
date_pst Pacific Time of the publication of traffic report
level jam level, where 1 – almost no jam and 5 –
standstill jam
speed driver’s captured speed in mph
length length of the traffic ahead in the route of user
in meters
*date_pst *date splits into month, day, hour, min, sec,
weekday
21. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data transformation
Randomly selected data ~ 100MB
Select relevant features
Group level into 2 classes (label: 0 & 1)
Join holidays dataset
Add attribute is_holiday (0 or 1)
Change cyclical attributes from Polar
coordinates to Cartesian
Add is_rush, is_weekend (0 or 1)
Normalize features
Make categorical: is_rush, is_holiday,
is_weekend, label
22. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
SELECT location_x, location_y,
SIN((weekday)*(2*PI()/7)) as sin_weekday,
COS((weekday)*(2*PI()/7)) as cos_weekday,
SIN((month-1)*(2*PI()/12)) as sin_month,
COS((month-1)*(2*PI()/12)) as cos_month,
SIN((day-1)*(2*PI()/31)) as sin_day, COS((day-
1)*(2*PI()/31)) as cos_day,
SIN(hour*(2*PI()/24)) as sin_hour,
COS(hour*(2*PI()/24)) as cos_hour,
SIN(min*(2*PI()/60)) as sin_min,
COS(min*(2*PI()/60)) as cos_min ,
SIN(sec*(2*PI()/60)) as sin_sec,
COS(sec*(2*PI()/60)) as cos_sec,
…
23. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
MODEL Evaluation
Model Accuracy Precision Recall AUC ROC
LR 0.662 0.662 1.0 0.571
BDT 0.805 0.832 0.884 0.868
DF 0.832 0.868 0.880 0.885
24. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary of Traffic Prediction with
Machine Learning
Model is based on sampled
dataset ~ 1M rows (100 MB)
Best model - Decision Forest
Accuracy – 0.832
Precision - 0.868
Recall - 0.880
Area under the Curve – 0.885
Confusion Matrix
25. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Introduction
H/W Specification
Architecture Chart
Implementation steps
Data structure
Analysis
Prediction
Summary
26. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary
Denser traffic on Freeways 101, 405, 10
Rush hours from 7 am to 9 am produce a lot of traffic, the
heaviest traffic time start from 3pm and gets better after 6pm.
Major areas of traffic in DTLA, Santa Monica, Hollywood
More insights can be found with bigger dataset using this
framework for analysis of traffic
Using such data and platform can also give an opportunity to
predict traffic congestions. Prediction can be performed using
machine learning algorithm – Decision Forest with the
accuracy of 83% for predicting the heaviest traffic jam.
27. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Questions?
28. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
1. J. Barbaresso, G. Cordahi, D. Garcia et al., “USDOT’s Intelligent Transportation Systems (ITS) ITS Strategic Plan
2015- 2019,” 2014.
2. “Integrated Corridor Management,” Intelligent Transportation Systems - Integrated Corridor Management,
www.its.dot.gov/research_archives/icms/. Accessed April 14, 2019.
3. J. Kestelyn, “Real-Time Data Visualization and Machine Learning for London Traffic Analysis,” Google Cloud,
2016, cloud.google.com/blog/products/gcp/real-time-data-visualization-and-machine-learning-for-london-
traffic-analysis. Accessed April 14, 2019.
4. “Connected Citizens by Waze,” Waze, www.waze.com/ccp. Accessed April 14, 2019.
5. M. Schnuerle, “Louisville and Waze: Applying Mobility Data in Cities,” Harvard Civic Analytics Network
Summit on Data-Smart Government, 2017.
6. Louisville Metro. “Thunder Jams, 2017 Traffic Delays.” CARTO, louisvillemetro-
ms.carto.com/builder/d98732d0-1f6a-4db2-9f8a-e58026bf0d39/embed. Accessed April 14, 2019.
7. Louisville Metro. “Pothole Animation.” CARTO, cdolabs-admin.carto.com/builder/a80f62bf-98e1-4591-8354-
acfa8e51a8de/embed. Accessed April 14, 2019.
8. E. Necula, “Analyzing Traffic Patterns on Street Segments Based on GPS Data Using R,” Transportation
Research Procedia, Vol. 10, pp. 276–285, 2015.
29. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
9. J. Woo and Y. Xu, “Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing,” in Proc. of
International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las
Vegas. 2011.
10. “Pandas.io.json.json_normalize.” Pandas.io.json.json_normalize - Pandas 0.24.2 Documentation,
pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.json_normalize.html. Accessed April
14, 2019.
11. United States, Chief Executive Office County of Los Angeles. “Cities within the County of Los Angeles.”
lacounty.gov. Accessed April 14, 2019.
12. Garyericson. “What Is - Azure Machine Learning Studio.” Microsoft Docs, docs.microsoft.com/en-
us/azure/machine-learning/studio/what-is-ml-studio. Accessed April 14, 2019.
13. A. Tharwat, “Classification Assessment Methods.” Applied Computing and Informatics, 2018.
14. M. Sokolova and L. Guy, “A Systematic Analysis of Performance Measures for Classification
Tasks,” Information Processing & Management, Vol. 45. No. 4, pp. 427–437, 2009.