Time series classification

Time Series Classification
By Data Mutation
안명호
www.deepnumbers.com

Who am i?
• 안명호, MHR Inc
• www.deepnumbers.com
• 어느 날 알게 된 머신러닝에 흠뻑 빠져 그동안
애지중지하던 클라우드를 버리고 머신러닝으로
전향하였다. 이제 더는 다른 기술은 관심을 두지 않고
머신러닝 한길만으로 정했기에 머신러닝을 공부하며
어려운 수식들을 다시 보느라 고생하고 있지만,
하루하루 배워가는 지식에 행복해하며 지내고 있다.

Time Series Data?
• 시계열(時系列, time series)은 일정 시간 간격으로 배치된 데이터들의 수열을
말한다. 시계열 해석(time series analysis)라고 하는 것은 이런 시계열을
해석하고 이해하는 데 쓰이는 여러 가지 방법을 연구하는 분야이다.
• 예컨대, 이런 시계열이 어떤 법칙에서 생성되어서 나오느냐는 기본적인 질문을
이해하는 것이 궁극적인 목표라고 할 수 있다.
• 시계열 예측(time series prediction)이라고 하는 것은 주어진 시계열을 보고
수학적인 모델을 만들어서 미래에 일어날 것들을 예측하는 것을 뜻하는 말이다.
일반적으로 이런 방법들은 공학이나 과학계산, 혹은 금융시장에서의 주가 예측
등에서 많이 쓰인다.
From WIKI

Time Series Data Example – 주가

Time Series Data Example - ECG
http://grammarviz2.github.io/grammarviz2_site/morea/anomaly/experience-a2.html

TSC(Time Series Classification)?
시계열데이터를 분석해 분류하는 것
http://alumni.cs.ucr.edu/~ychen053/

TSC Example - 주가예측
http://tutorials.topstockresearch.com/ChartPatterns/Triangles/TutorialsOnTriangleChartPattern.html
만약에 이미 알려진 주가의 패턴을 남보다 빨리 알 수 있다면…….

Time Series Classification Examples
• 서버나 데이터센터의 전력사용량 예측
• 전자기파의 에러검출
• 주식시장의 주가예측
• ECG 데이터를 이용한 심장병 발생예측
• 영유아의 패혈증 조기진단
• 네트워크 트래픽 종류 판별
• ….........
현실세계에서 매우 광범위하게 적용가능

Problems in TSC
여러가지 문제가 있지만 가장 골치 아픈 2가지
Scale Noise

Problem #1 : Scale
• 시계열데이터의 불규칙한 시간간격(Time Scale)
• 예를 들어 서버의 에너지 사용량 시계열 데이터의 패턴이 있을때,
해당 패턴들이 모두 동일한 시간간격내에 존재하지 않고 서로 다른
시간간격내에 존재하기 때문에, 패턴을 인식하기 위해서는 동일한
패턴의 데이터들을 적정한 시간간격으로 변화시키거나 혹은 이러한
시간간격을 무시하고 특징(Feature)를 찾아내는 등의 방법이
필요하게 된다.

Scale Problem
Same or not?
A B

Problem #2 : Noise
• 불규칙적인 노이즈
• 측정시점에서의 계측장비나 혹은 상황의 변화로 인해 노이즈가
발생할 수 있고, 측정대상이 어떠한 이유로 인해 엉뚱한 값을
만들어낼 수도 있다. 따라서 이러한 노이즈를 최소화시키고
시계열데이터의 고유한 특징을 가지고 있는 데이터를
추출하거나, 노이즈를 제거하는 방법이 요구된다.

Noise Problem
http://www.spectraworks.com/Help/prediction.html

Key Idea for TSC
서로 다른 Scale과 Noise를 데이터에 추가해 학습시킨다면?

TSC by Data Mutation
• 시간간격 문제를 처리하기 위해서는 변종데이터를 생성하는데 필요한 파라메터인
Window의 값을 다양하게 사용해 Time Scale을 변경한다. 아울러 Up-sampling으로
scale을 늘리고, down-sampling을 이용해 time scale을 줄여 CNN이 특정 패턴의
시간간격의 변화에 민감하지 않도록 학습시킨다.
• 노이즈에 강인하도록 만들기 위해서는 원본 데이터로부터 해당 데이터의 특징을 잘
표현할 수 있는 특성치(feature)를 추출해야 하는데, 이를 위해 Upsampling, hamming,
blackman과 같은 filtering을 적용한다
원본데이터의 특성을 유지하면서 Scale과 Noise에 강인하도록 학습

TSC by Data Mutation & CNN
Raw Data
Mutated Data #1
Mutated Data #2
Mutated Data #N
CNN Model #1
CNN Model #2
CNN Model #N
Ensemble
CNN Model
변종 데이터들은 원본 데이터를 다양한 시간간격을 포함하고, 이상치와 특이치를 제외한
데이터를 가지고 있어 CNN에 다양한 원본 데이터의 변형에 노출시켜 다각도로 특성을 학습

Key Idea – Data Mutation
Smoothing Filtering Sampling
3가지 방법을 이용해 원본데이터의 특성을 유지하면서
Scale과 Noise에 강인한 데이터 생성

Smoothing
to smooth a data set is to create an approximating function that attempts to
capture important patterns in the data, while leaving out noise or other fine-
scale structures/rapid phenomena.
http://fedc.wiwi.hu-berlin.de/xplore/tutorials/xegbohtmlnode44.html
ARIMA Model

Filtering
• a filter is a device or process that
removes some unwanted components
or features from a signal.
• Filtering is a class of signal processing,
the defining feature of filters being the
complete or partial suppression of some
aspect of the signal
• Upsampling, hamming, blackman등의
filter 적용

Sampling
• sampling is the reduction of a continuous
signal to a discrete signal. A common example is
the conversion of a sound wave (a continuous
signal) to a sequence of samples (a discrete-
time signal).
• Up-sampling으로 scale을 늘리고, down-
sampling을 이용해 time scale을 줄여 CNN이 특정
패턴의 시간간격의 변화에 민감하지 않도록 학습
http://fourier.eng.hmc.edu/e161/lectures/filterbank/node1.html
Down & Up Sampling

Learning : CNN
기존의 CNN을 변형해서 여러개의 Weak CNN Model을 만든다면
https://sites.google.com/site/shahriarinia/home/ai/machine-learning/neural-networks/deep-learning/theano-mnist/3-convolutional-neural-network-lenet

CNN for Data Mutation Process
Data Mutation Weak Learners Ensemble Learner
CNN을 활용한 Softmax Ensemble Model

CNN for Data Mutation
Data
Filtered Data
Filtered Data
Filtered Data
filter2
Data
Filtered Data
Filtered Data
Filtered Data
filter1
Filtered Data
Filtered Data
Filtered Data
filter3
filter1
filter1
filter1
mutator2
mutator3
mutator1
General CNN CNN for Data Mutation

CNN : Weak Learner
1-D Convolution
1-D Convolution
1-D Convolution
Max PoolingSame Filter
동일한 Filter를 적용하고, Max Pooling을 이용해 Overfitting 방지

Ensemble by Softmax
다수의 CNN을 사용하기 때문에, 하나의 데이터에 대해 다수의 CNN에서 패턴인식에 대한
결과값을 내놓는다. Softmax는 다수의 CNN에서 계산한 결과값을 통합해 최종적인 결과를
내놓기 위한 것으로 각 CNN별로 적절한 가중치를 계산해 최종 결과값을 계산

To Find Optimal Parameters
• Parameters to be set
• Window Size
• Filter Size
• Pool Size
• Activation Function
• Learning Rate
• Batch Training Count
• And so on…..
∑ 𝑫𝒂𝒕𝒂𝒔𝒆𝒕 𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚
∑ 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 𝐒𝐓𝐃
Optimum Parameter

TSC by Data Mutation Result
Comparison by Error Rate

Time series classification

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Time series classification

Ähnlich wie Time series classification (20)

Mehr von Sung Kim

Mehr von Sung Kim (12)

Time series classification