18.07.11_useR2018 Poster_Time Series Digger : Automatic time series analysis for data science in R
1. J They result in suitable detection
Our application and data
- Network traffic
- Managing service and data center
- Customer behavior
All data is time series in the real-world
We need effective and comprehensive data
process for various problem settings
ggAutoTimeSeriesPlot(df, …)
- Plotting ggplot-based objects with
combinations of variables, time
intervals, and aggregation functions
Time Series Digger : Automatic time series analysis for data science in R
Motoyuki OKI, Yusuke SAITO, Yuki HIRA, and Yukio UEMATSU; NTT Communications corp.; E-mail : dstu-td@ntt.com
Introduction Time Series Digger and Real-World Usecase
Exploratory Data Analysis
Feature Construction for Achieving High Detection Accuracy
Task Explore useful
variables and time
intervals to detect
anomalies
addDatetimeExpression(df, …)
-Creating multiple time expression
addBasicStatistics(df, …)
-Creating descriptive statistics features based on
multiple moving functions
A great number of series by
time intervals and variables L
Automatic plot multiple variables,
time intervals and distributions
J Find useful intervals to detection
Short Time Interval Long
df
Time series oriented feature
extraction L
Task Construct useful time series oriented features
Motivation Developing Time Series Digger to
accelerate the process
Modeling for Anomaly Detection
Various methods and packages
with different interface L
Task Detect anomaly of time point t from past
sub time series features
dfExample
Time Features Moving Average Features
AnomalyDetection(df, method, …)
-Detecting anomalies based on Singular
Spectrum Transformation, Robust Principal
Component Analysis, and other methods
Problem
Settings
Data Science Process
1 - 10 million records /day
100 thousand records /day
10 - 100 million records /day
Exploratory
Data Analysis
Feature
Construction
Modeling
Network data
Singular
Spectrum
Transformation
Robust Principal
Component
Analysis
Evaluation
Different purposes with
multiple datasets
Discussion
Setting suitable metrics for
the purposes