More Related Content Similar to presentation_ECMLPKDD16_Concept_v1 Similar to presentation_ECMLPKDD16_Concept_v1 (20) presentation_ECMLPKDD16_Concept_v11. Concept Neurons –
Handling Drift Issues for
Real-Time Industrial Data Mining
Luis Moreira-Matias .::. luis.matias@neclab.eu
Intelligent Transport Systems Group
Social Solutions Research division
NEC Laboratories Europe, Heidelberg, DE
Joao Gama, Joao Mendes-Moreira
University of Porto and LIAAD INESC-TEC, Portugal
Riva Del Garda, Italy @ ECML/PKDD .::. September, 2016
2. Outline
Problem Overview (Real-Time Industrial DM)
Notes on Concept Drift phenomenon
Concept Neurons
Case Studies
Experiments
Final Remarks
3. 3 © NEC Corporation 20152016
Increasing Interest on Analytics/Data Science during recent
years pushed software engineers into the game!
Problem Overview
4. 4 © NEC Corporation 20152016
Problem Overview
5. 5 © NEC Corporation 20152016
Trends
● Tons of new “Data Scientist” roles filled with
programmers fundamental background;
● Real-Time Data Processing;
● Off-the-shelf libraries -> Offline Machine Learning
Temptation!!
Unrealistic assumptions
Lead to suboptimal results!
6. 6 © NEC Corporation 20152016
Notes on Concept Drift phenomenon
∃𝑋: 𝑝𝑡(𝑦|𝑋) ≠ 𝑝𝑡+1(𝑦|𝑋)
This guy still does a fair
job under drift ...but for
how long?
How much are you losing by
relying in an inaccurate
model?
t
7. 7 © NEC Corporation 20152016
Notes on Concept Drift phenomenon
● Real-Time Data Mining must cope with Concept Drift!
● Adaptive/Online Learning Schemas are not yet
popular among off-the-shelf libraries;
● Different types of drift require different drift handling
mechanisms;
Image kindly extracted from: Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift
adaptation. ACM Computing Surveys (CSUR), 46(4), 44.
● How can we adapt existing off-the-shelf
algorithms to resist drift without having
large empirical/fundamental effort?
8. 8 © NEC Corporation 20152016
Business Value of Real-Time DM
Examples of the Value of Reactiveness Ability to drift:
● Transportation
● Highway Congestion Prediction (for traffic control purposes);
● Travel Time Prediction (for navigational purposes);
● Recommendation Systems
● Retail (highly popular new product);
● Media (highly popular new movie);
● Communications
● Security Failures (new Virus signature);
● Fraud Detection (bank transactions / mobile phone carriers);
9. 9 © NEC Corporation 20152016
Concept Neuron – Base Idea
● Base Real-Time DM Learning Schema
Data
(X)
Feedback
(y)
Memory
Loss
Estimation
Model
Learning
Change
Detection
Prediction
Alarm
Image adapted from: Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM
Computing Surveys (CSUR), 46(4), 44.
A
B
C
D
E
F
10. 10 © NEC Corporation 20152016
Asynchronous Concept Neuron (ACN)
● Assumptions
● Our samples are being generated by a distribution which is
constantly drifting somehow slowly;
● An offline learning model is trained periodically for the most
recent (window) data;
● Example of possible base learners:
● Multilayer Perceptron (with and without kernels);
● Least Squares (with and without kernels, l1/l2 norms);
● MultiAdaptive Regression Spilines (MARS);
● Time Series Analysis (e.g. ARIMA, Exponential Smoothing);
11. 11 © NEC Corporation 20152016
Asynchronous Concept Neuron (ACN)
Data
(X)
Windowed
Memory
Loss
Estimation
Offline Model
Learning
Prediction
A
B
C
E
Feedback
(y)
D
Model
Update
12. 12 © NEC Corporation 20152016
Asynchronous Concept Neuron
● Modus Operandi
1. An offline model is periodically trained over the samples arrived
within a given window T;
2. Loss is estimated for each arrived sample;
3. Update the model in the inverse direction of the gradient loss;
● Additional Assumption (implied):
● Convergence is still possible at a smaller rate when done
incrementally for a sufficiently small learning rate and an
adequate Learner;
13. 13 © NEC Corporation 20152016
Synchronous Concept Neuron (SCN)
● Assumptions
● Our samples are being generated by a distribution which may
suffer drifts recurrently (however, it is stationary for some
periods in time);
● These drift events are usually limited on time – if not, the model
needs to be fully re-trained;
● An (online) learning model is trained periodically for the most
recent (window) data;
● Example of possible base learners:
● Anytype!
14. 14 © NEC Corporation 20152016
Synchronous Concept Neuron (SCN)
Data
(X)
Feedback
(y)
Windowed
Memory
Loss
Estimation
Model
Learning
Change
Detection
Prediction
A
B
C
D
E
Prediction
Corrections
15. 15 © NEC Corporation 20152016
Synchronous Concept Neuron (SCN)
● Modus Operandi
1. A model is periodically trained over the samples arrived within a
given window T;
2. If a drift alarm is triggered, the most recent residuals are used to
update the model’s prediction directly (greedy);
3. If a novel drift is detected on the base model outputs, the learning rate is
increased;
4. If no drift is detected for a some periods (hyperparameter), the greedy
updates are turned off.
● Additional Assumption (implied):
● As the drift re-occurs typically with a fast rate but yet limited in
time, we can rely on our model while simply updating its
outputs to guarantee no divergence.
16. 16 © NEC Corporation 20152016
Case Study A – Taxi Demand Forecasting
17. 17 © NEC Corporation 20152016
Case Study A – Taxi Demand Forecasting
4 Key Variables:
1) the expected price for
a service over time;
2) The distance of each
stand (i.e. cost);
3) Number of taxis
already parked in a
stand;
4) The demand
predicted for a given
stand;
18. 18 © NEC Corporation 20152016
Real-World Case Study A: Portuguese Taxi Operator
Case Study: PORTUGAL (EMEA)
19. 19 © NEC Corporation 20152016
Real-World Case Study A: Portuguese Taxi Operator
Porto is Portugal’s second largest city;
1.3 million inhabitants;
Two taxi fleets = 700 taxi vehicles;
Data acquired using one fleet of roughly 450 taxi vehicles;
FCD feed from August 2011 to April 2012.
1 sample/veh./15 secs;
Total: 1 Million logged trips
Aggregation: demand counts per 30 minutes;
20. 20 © NEC Corporation 20152016
Real-World Case Study B - Predict Traffic Incidents
▌Is traffic congestion a necessary evil?
21. 21 © NEC Corporation 20152016
Real-World Case Study B - Predict Traffic Incidents
Re-routing Reversible Lanes
Earlier Dispatching of Safety/Clearance Personnel
Dynamic Speed Control
22. 22 © NEC Corporation 20152016
Real-World Case Study B: Japanese Highway Operator
Case Study: JAPAN (APAC)
23. 23 © NEC Corporation 20152016
Real-World Case Study B: Japanese Highway Operator
▌Data collected from 106 sensors deployed along 20km of freeway;
▌Period studied: 3 non-consecutive weeks;
▌Sample: the number of vehicles (flow) traversed per 15 minutes;
24. 24 © NEC Corporation 20152016
Experimental Setup
▌Statistical independence is assumed to be in place
both for the (A) demand in the stands and for the (B)
flow in each sensors;
▌Test sets
A) 4 last weeks;
B) Last 5 days of each one of the 3 weeks;
▌Task: Predict the next term of the series
A) Demand Count for each stand;
B) Flow Count for each sensor;
B) Congestion Prediction;
25. 25 © NEC Corporation 20152016
Experimental Setup
▌A) fully sample-by-sample retrained ARIMA with GLS (ARIGLS) vs.
ACN (base learner: ARIGLS);
Main Idea: reduce the computational complexity of running full GLS for each new
sample;
▌Evaluation: sMAPE
▌In B), we compared two fully sample-by-sample retrained ARIMA
and ETS (w/GLS) vs. SCN (base learner: online weighting
ensemble of ARIMA and ETS);
Main Idea: check how SCN handles bursty/reocurring drifts;
▌Evaluation: RMSE, MAE (flow count prediction)
▌Evaluation: Precision, Recall (Congestion Prediction)
26. 26 © NEC Corporation 20152016
Results A: Portuguese Taxi Operator
Case Study: PORTUGAL (EMEA)
27. 27 © NEC Corporation 20152016
Results A: Portuguese Taxi Operator
▌Avg. Training/Runtime of ARIGLS per prediction: 99.77s;
▌Avg. Training/Runtime of ACN per prediction: 32.44s
28. 28 © NEC Corporation 20152016
Real-World Case Study B: Japanese Highway Operator
Case Study: JAPAN (APAC)
29. 29 © NEC Corporation 20152016
Results B: Japanese Highway Operator
▌Results for SCN (Drift3Flow):
RMSE/MAE -5% (flow/occupancy forecasting)
PRECISION +2% / RECALL +30% (incident prediction)
+200 INCIDENTS PREDICTED
30. 30 © NEC Corporation 20152016
Final Remarks
● Huge market needs on Data Science;
● Great real-time data processing abilities;
● Complex DM problems;
Bias to Use the same off-the-shelf technique to solve ALL problems;
● Real-time DM must cope with Concept Drift
● Concept Neurons can operate on the top of the most
used supervised learning algorithms;
● We demonstrated that it can guarantee convergence,
lower computational effort and higher generalization;
● We showed the added business value of it in two
applications along transportation domain;
31. 31 © NEC Corporation 20152016
Some Additional References (feel free to ask for more... )
luis.moreira.matias@gmail.com
▌ Moreira-Matias L., Cats O., Gama J., Mendes-Moreira J. and Sousa J.F. “An Online Learning Approach to Eliminate Bus
Bunching in Real-Time.” Applied Soft Computing. Vol. 47. pp. 460-482. 2016
▌ Moreira-Matias L., Gama J., Ferreira M., Mendes-Moreira J. and Damas L. “Time-evolving O-D Matrix Estimation using
highspeed GPS data streams.” Expert Systems with Applications. Vol. 44. pp. 275-268. 2016.
▌ Khiary, J., Moreira-Matias, L., Cerqueira, V., Cats, O. “Automated Setting of Bus Schedule Coverage using Unsupervised
Machine Learning”. Advances in Knowledge Discovery and Data Mining - 20th Pacific-Asia Conference, PAKDD. pp. 552-564.
Springer. (2016)
▌ Moreira-Matias, L., Alesiani, F. “Drift3Flow: Freeway-Incident Prediction using Real-Time Learning.” 18th International IEEE
Conference on Intelligent Transportation Systems (ITSC). pp. 566-571. 2015.
▌ Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., Damas, L. “Predicting Taxi-Passenger Demand Using Streaming
Data”. IEEE Transactions on Intelligent Transportation Systems. Vol.14, no.3. pp.1393-1402. 2013.
▌ Moreira-Matias L., Mendes-Moreira J., Sousa J.F. and Gama J. “Improving Mass Transit Operations by using AVL based
Systems: A Survey”. IEEE Transactions on Intelligent Transportation Systems. Vol. 16, no. 4. pp. 1636-1653. 2015.
▌ Mendes-Moreira J., Moreira-Matias L., Gama J. and Sousa J.F. “Validating the Coverage of Bus Schedules: A Machine Learning
Approach”. Information Sciences. Vol. 293, no. 1. pp. 299-313. 2015.
▌ Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., Damas, L. “On Predicting the Taxi-Passenger Demand: A Real-
Time Approach”. Progress in Artificial Intelligence. LNCS 8154. Springer. pp. 54-65. 2013.
▌ Moreira-Matias, L., Gama, J., Mendes-Moreira, J., Freire de Sousa, J. “An Incremental Probabilistic Model to Predict Bus
Bunching in Real-Time”. Advances in Intelligent Data Analysis XIII. LNCS vol. 8819. pp. 227-238. Springer. 2014.
▌ Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., Damas, L.. “Online Predictive Model for Taxi Services. Advances
in Intelligent Data Analysis XI. LNCS vol. 7619. pp. 230-240. Springer. 2012.
▌ Moreira-Matias, L., Fernandes, R., Gama, J., Ferreira, M., Mendes-Moreira, J., Damas, L., “An Online Recommendation System
for the Taxi Stand choice Problem” (Poster) IEEE Vehicular Network Conference (IEEE VNC). pp. 173-180. 2012.
32. 32 © NEC Corporation 20152016
Thank you for your time!
luis.matias@neclab.eu