SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Mining Time-Series Data
1
Time-Series Database
 Consists of sequences of values or events obtained over
repeated measurements of time (weekly, hourly…)
 Stock market analysis, economic and sales forecasting, scientific
and engineering experiments, medical treatments etc.
 Can also be considered as a Sequence database
 Consists of a sequence of ordered events (time – optional)
 Web page Traversal Sequence
 Time-Series data can be analyzed to:
 Identify correlations
 Similar / Regular patterns, trends, outliers
2
Trend Analysis
Time Series involving a variable Y can be represented as a function
of time t, Y = F(t)
Goals of Time-Series Analysis
 Modeling time series - To gain insight into the mechanism
 Forecasting time series - For prediction
3
Trend Analysis
 Trend Analysis Components
 Long-term or trend movements (trend curve): general direction in
which a time series is moving over a long interval of time
 Cyclic movements or cycle variations: long term oscillations
about a trend line or curve
 e.g., business cycles, may or may not be periodic
 Seasonal movements or seasonal variations
 i.e, almost identical patterns that a time series appears to
follow during corresponding months of successive years.
 Irregular or random movements
 Time series analysis: decomposition of a time series into
these four basic movements
 Additive Model: TS = T + C + S + I
 Multiplicative Model: TS = T × C × S × I
4
Trend Analysis
 Adjusting Seasonal fluctuations
 Given a series of measurements y1, y,2, y3… influences of the data
that are systematic / calendar related must be removed
 Fluctuations conceal true underlying movement of the series and non-seasonal
characteristics
 De-seasonalize the data
 Seasonal Index – set of numbers showing the relative values of a
variable during the months of a year
 Sales during Oct, Nov, Dec – 80%, 120% and 140% of average monthly sales –
Seasonal index – 80, 120, 140
 Dividing original monthly data by seasonal index – De-seasonalizes data
 Auto-Correlation Analysis
 To detect correlations between ith element and (i-k)th element k- lag
 Pearson’s coefficient can be used between <y1, y2,…yN-k> and <yk+1,
yk+2, ..yN>
5
Trend Analysis
 Estimating Trend Curves
 The freehand method
 Fit the curve by looking at the graph
 Costly and barely reliable for large-scaled data mining
 The least-square method
 Find the curve minimizing the sum of the squares of the
deviation di of points yi on the curve from the corresponding
data points - ∑i=1
n
di
2
 The moving-average method
6
Trend Analysis
 Moving Average Method
 Smoothes the data
 Eliminates cyclic, seasonal and irregular movements
 Loses the data at the beginning or end of a series
 Sensitive to outliers (can be reduced by Weighted Moving Average)
 Assigns greater weight to center elements to eliminate smoothing effects
 Ex: 3 7 2 0 4 5 9 7 2
 Moving average of order 3: 4 3 2 3 6 7 6
 Weighted average (1 4 1): 5.5 2.5 1 3.5 5.5 8 6.5
7
Trend Analysis
 Once trends are detected data can be divided by
corresponding trend values
 Cyclic Variations can be handled using Cyclic Indexes
 Time-Series Forecasting
 Long term / Short term predictions
 ARIMA – Auto-Regressive Integrated Moving Average
8
Similarity Search
 Normal database query finds exact match
 Similarity search finds data sequences that differ only slightly
from the given query sequence
 Two categories of similarity queries
 Whole matching: find a sequence that is similar to the query
sequence
 Subsequence matching: find all pairs of similar sequences
 Typical Applications
 Financial market
 Market basket data analysis
 Scientific databases
 Medical diagnosis
9
Data Reduction and Transformation
 Time Series data – high-dimensional data – each point
of time can be viewed as a dimension
 Dimensionality Reduction techniques
 Signal Processing techniques
 Discrete Fourier Transform
 Discrete Wavelet Transform
 Singular Value Decomposition based on PCA
 Random projection-based Sketches
 Time Series data is transformed and strongest coefficients –
features
 Techniques may require values in Frequency domain
 Distance preserving Ortho-normal transformations
 The distance between two signals in the time domain is the same as their
Euclidean distance in the frequency domain
10
Indexing methods for Similarity
Search
 Multi-dimensional index
 Use the index to retrieve the sequences that are at most a certain
small distance away from the query sequence
 Perform post-processing by computing the actual distance
between sequences in the time domain and discard any false
matches
 Sequence is mapped to trails, trail is divided into sub-trails
 Indexing techniques
 R-trees, R*-trees, Suffix trees etc
11
Subsequence Matching
 Break each sequence into a set of pieces of window with length w
 Extract the features of the subsequence inside the window
 Map each sequence to a “trail” in the feature space
 Divide the trail of each sequence into “subtrails” and represent each
of them with minimum bounding rectangle
 Use a multi-piece assembly algorithm to search for longer sequence
matches
 Uses Euclidean distance (Sensitive to outliers)
12
Similarity Search Methods
 Practically there maybe differences in the baseline and
scale
 Distance from one baseline to another – offset
 Data has to be normalized
 Sequence X = <x1, x2, ..xn> can be replaced by X’ = <x1’, x2’, …xn’>
where xi’ = xi - µ / σ
 Two subsequences are considered similar if one lies within an
envelope of ε width around the other, ignoring outliers
 Two sequences are said to be similar if they have enough non-
overlapping time-ordered pairs of similar subsequences
 Parameters specified by a user or expert: sliding window size,
width of an envelope for similarity, maximum gap, and matching
fraction
13
Similarity Search Methods
14
Similarity Search Method
 Atomic matching
 Find all pairs of gap-free windows of a small length that are
similar
 Window stitching
 Stitch similar windows to form pairs of large similar
subsequences allowing gaps between atomic matches
 Subsequence Ordering
 Linearly order the subsequence matches to determine whether
enough similar pieces exist
15
Query Languages for Time Sequence
 Time-sequence query language
 Should be able to specify sophisticated queries like
 Find all of the sequences that are similar to some sequence in class A, but not similar to
any sequence in class B
 Should be able to support various kinds of queries: range queries, all-
pair queries, and nearest neighbor queries
 Shape definition language
 Allows users to define and query the overall shape of time sequences
 Uses human readable series of sequence transitions or macros
 Ignores the specific details
 E.g., the pattern up, Up, UP can be used to describe increasing
degrees of rising slopes
 Macros: spike, valley, etc.
16

Weitere ähnliche Inhalte

Was ist angesagt?

Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Mustafa Sherazi
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATAGauravBiswas9
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 

Was ist angesagt? (20)

Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 

Ähnlich wie 5.2 mining time series data

Using timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptUsing timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptjohnsaida3101
 
20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.pptPalaniKumarR2
 
Iso 9001 consultants uk
Iso 9001 consultants ukIso 9001 consultants uk
Iso 9001 consultants ukjondarita
 
Methods of quality management
Methods of quality managementMethods of quality management
Methods of quality managementselinasimpson2001
 
What is quality management system
What is quality management systemWhat is quality management system
What is quality management systemselinasimpson0201
 
Quality manual for iso 9001
Quality manual for iso 9001Quality manual for iso 9001
Quality manual for iso 9001jondarita
 
Quality management conference
Quality management conferenceQuality management conference
Quality management conferenceselinasimpson311
 
Iso 9001 course
Iso 9001 courseIso 9001 course
Iso 9001 courseporikgefus
 
Iso 9001 manual
Iso 9001 manualIso 9001 manual
Iso 9001 manualjomharipe
 
Iso 9001 courses
Iso 9001 coursesIso 9001 courses
Iso 9001 coursesdenritafu
 
Iso 9001 templates
Iso 9001 templatesIso 9001 templates
Iso 9001 templatespogerita
 
Quality assurance and management
Quality assurance and managementQuality assurance and management
Quality assurance and managementselinasimpson2301
 
Quality management presentation
Quality management presentationQuality management presentation
Quality management presentationselinasimpson1501
 
Ts 16949 quality management system
Ts 16949 quality management systemTs 16949 quality management system
Ts 16949 quality management systemselinasimpson381
 
Course in quality management
Course in quality managementCourse in quality management
Course in quality managementselinasimpson361
 
What is iso 9001 standard
What is iso 9001 standardWhat is iso 9001 standard
What is iso 9001 standardjomjintra
 

Ähnlich wie 5.2 mining time series data (20)

Time series.ppt
Time series.pptTime series.ppt
Time series.ppt
 
Using timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptUsing timeseries extraction the mining.ppt
Using timeseries extraction the mining.ppt
 
20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt
 
17329274.ppt
17329274.ppt17329274.ppt
17329274.ppt
 
Quality management thesis
Quality management thesisQuality management thesis
Quality management thesis
 
Iso 9001 consultants uk
Iso 9001 consultants ukIso 9001 consultants uk
Iso 9001 consultants uk
 
Methods of quality management
Methods of quality managementMethods of quality management
Methods of quality management
 
What is quality management system
What is quality management systemWhat is quality management system
What is quality management system
 
Quality manual for iso 9001
Quality manual for iso 9001Quality manual for iso 9001
Quality manual for iso 9001
 
Quality management conference
Quality management conferenceQuality management conference
Quality management conference
 
Iso 9001 course
Iso 9001 courseIso 9001 course
Iso 9001 course
 
Iso 9001 manual
Iso 9001 manualIso 9001 manual
Iso 9001 manual
 
Iso 9001 courses
Iso 9001 coursesIso 9001 courses
Iso 9001 courses
 
Quality management cycle
Quality management cycleQuality management cycle
Quality management cycle
 
Iso 9001 templates
Iso 9001 templatesIso 9001 templates
Iso 9001 templates
 
Quality assurance and management
Quality assurance and managementQuality assurance and management
Quality assurance and management
 
Quality management presentation
Quality management presentationQuality management presentation
Quality management presentation
 
Ts 16949 quality management system
Ts 16949 quality management systemTs 16949 quality management system
Ts 16949 quality management system
 
Course in quality management
Course in quality managementCourse in quality management
Course in quality management
 
What is iso 9001 standard
What is iso 9001 standardWhat is iso 9001 standard
What is iso 9001 standard
 

Mehr von Krish_ver2

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back trackingKrish_ver2
 
5.5 back track
5.5 back track5.5 back track
5.5 back trackKrish_ver2
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02Krish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithmKrish_ver2
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03Krish_ver2
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programmingKrish_ver2
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquerKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02Krish_ver2
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing extKrish_ver2
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashingKrish_ver2
 

Mehr von Krish_ver2 (20)

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back tracking
 
5.5 back track
5.5 back track5.5 back track
5.5 back track
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithm
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programming
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquer
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing ext
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
 

Kürzlich hochgeladen

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 

Kürzlich hochgeladen (20)

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 

5.2 mining time series data

  • 2. Time-Series Database  Consists of sequences of values or events obtained over repeated measurements of time (weekly, hourly…)  Stock market analysis, economic and sales forecasting, scientific and engineering experiments, medical treatments etc.  Can also be considered as a Sequence database  Consists of a sequence of ordered events (time – optional)  Web page Traversal Sequence  Time-Series data can be analyzed to:  Identify correlations  Similar / Regular patterns, trends, outliers 2
  • 3. Trend Analysis Time Series involving a variable Y can be represented as a function of time t, Y = F(t) Goals of Time-Series Analysis  Modeling time series - To gain insight into the mechanism  Forecasting time series - For prediction 3
  • 4. Trend Analysis  Trend Analysis Components  Long-term or trend movements (trend curve): general direction in which a time series is moving over a long interval of time  Cyclic movements or cycle variations: long term oscillations about a trend line or curve  e.g., business cycles, may or may not be periodic  Seasonal movements or seasonal variations  i.e, almost identical patterns that a time series appears to follow during corresponding months of successive years.  Irregular or random movements  Time series analysis: decomposition of a time series into these four basic movements  Additive Model: TS = T + C + S + I  Multiplicative Model: TS = T × C × S × I 4
  • 5. Trend Analysis  Adjusting Seasonal fluctuations  Given a series of measurements y1, y,2, y3… influences of the data that are systematic / calendar related must be removed  Fluctuations conceal true underlying movement of the series and non-seasonal characteristics  De-seasonalize the data  Seasonal Index – set of numbers showing the relative values of a variable during the months of a year  Sales during Oct, Nov, Dec – 80%, 120% and 140% of average monthly sales – Seasonal index – 80, 120, 140  Dividing original monthly data by seasonal index – De-seasonalizes data  Auto-Correlation Analysis  To detect correlations between ith element and (i-k)th element k- lag  Pearson’s coefficient can be used between <y1, y2,…yN-k> and <yk+1, yk+2, ..yN> 5
  • 6. Trend Analysis  Estimating Trend Curves  The freehand method  Fit the curve by looking at the graph  Costly and barely reliable for large-scaled data mining  The least-square method  Find the curve minimizing the sum of the squares of the deviation di of points yi on the curve from the corresponding data points - ∑i=1 n di 2  The moving-average method 6
  • 7. Trend Analysis  Moving Average Method  Smoothes the data  Eliminates cyclic, seasonal and irregular movements  Loses the data at the beginning or end of a series  Sensitive to outliers (can be reduced by Weighted Moving Average)  Assigns greater weight to center elements to eliminate smoothing effects  Ex: 3 7 2 0 4 5 9 7 2  Moving average of order 3: 4 3 2 3 6 7 6  Weighted average (1 4 1): 5.5 2.5 1 3.5 5.5 8 6.5 7
  • 8. Trend Analysis  Once trends are detected data can be divided by corresponding trend values  Cyclic Variations can be handled using Cyclic Indexes  Time-Series Forecasting  Long term / Short term predictions  ARIMA – Auto-Regressive Integrated Moving Average 8
  • 9. Similarity Search  Normal database query finds exact match  Similarity search finds data sequences that differ only slightly from the given query sequence  Two categories of similarity queries  Whole matching: find a sequence that is similar to the query sequence  Subsequence matching: find all pairs of similar sequences  Typical Applications  Financial market  Market basket data analysis  Scientific databases  Medical diagnosis 9
  • 10. Data Reduction and Transformation  Time Series data – high-dimensional data – each point of time can be viewed as a dimension  Dimensionality Reduction techniques  Signal Processing techniques  Discrete Fourier Transform  Discrete Wavelet Transform  Singular Value Decomposition based on PCA  Random projection-based Sketches  Time Series data is transformed and strongest coefficients – features  Techniques may require values in Frequency domain  Distance preserving Ortho-normal transformations  The distance between two signals in the time domain is the same as their Euclidean distance in the frequency domain 10
  • 11. Indexing methods for Similarity Search  Multi-dimensional index  Use the index to retrieve the sequences that are at most a certain small distance away from the query sequence  Perform post-processing by computing the actual distance between sequences in the time domain and discard any false matches  Sequence is mapped to trails, trail is divided into sub-trails  Indexing techniques  R-trees, R*-trees, Suffix trees etc 11
  • 12. Subsequence Matching  Break each sequence into a set of pieces of window with length w  Extract the features of the subsequence inside the window  Map each sequence to a “trail” in the feature space  Divide the trail of each sequence into “subtrails” and represent each of them with minimum bounding rectangle  Use a multi-piece assembly algorithm to search for longer sequence matches  Uses Euclidean distance (Sensitive to outliers) 12
  • 13. Similarity Search Methods  Practically there maybe differences in the baseline and scale  Distance from one baseline to another – offset  Data has to be normalized  Sequence X = <x1, x2, ..xn> can be replaced by X’ = <x1’, x2’, …xn’> where xi’ = xi - µ / σ  Two subsequences are considered similar if one lies within an envelope of ε width around the other, ignoring outliers  Two sequences are said to be similar if they have enough non- overlapping time-ordered pairs of similar subsequences  Parameters specified by a user or expert: sliding window size, width of an envelope for similarity, maximum gap, and matching fraction 13
  • 15. Similarity Search Method  Atomic matching  Find all pairs of gap-free windows of a small length that are similar  Window stitching  Stitch similar windows to form pairs of large similar subsequences allowing gaps between atomic matches  Subsequence Ordering  Linearly order the subsequence matches to determine whether enough similar pieces exist 15
  • 16. Query Languages for Time Sequence  Time-sequence query language  Should be able to specify sophisticated queries like  Find all of the sequences that are similar to some sequence in class A, but not similar to any sequence in class B  Should be able to support various kinds of queries: range queries, all- pair queries, and nearest neighbor queries  Shape definition language  Allows users to define and query the overall shape of time sequences  Uses human readable series of sequence transitions or macros  Ignores the specific details  E.g., the pattern up, Up, UP can be used to describe increasing degrees of rising slopes  Macros: spike, valley, etc. 16