This document summarizes research on applying machine learning techniques to analyze categorical time series data from a manufacturing process. It discusses visualizing the process data, representing each batch as an object using transition matrices, calculating distances between batches, and applying classification and clustering algorithms. The goals are to quantify procedures, classify and cluster processes, and optimize the mixing process.
Machine Learning Techniques in Categorical Time Series
1. Machine Learning Techniques
In Categorical Time Series Analysis
Of Manufacturing Process
Haris Michailidis, Isidora Tourni
National Technical University of Athens
School of Electrical and Computer Engineering
Professor: Nectarios Koziris
J&J Responsible: Michalis Avgoulis
Presentation Date: 26/07/2016
4. Problem Motivation
In cooperation with Johnson & Johnson Hellas
Goals:
● Visualisation of Mixing Process
● Quantification of Procedures
● Classification & Clustering of processes
Further Goals:
● Optimization of the Mixing Process
● Comparison with Golden Standard (Evaluation)
● Comparison between different batches of the same Product
4
5. Process Description
Example Vessel Actions:
● Heating
● Agitation
● Addition of Materials
● Pressure adjustment
PLC logging
● Output to CSV
MixingRaw Materials Bottling
5
16. Goal: Explore the possibilities of Machine Learning in Manufacturing space, in
order to produce useful insights for the process.
● Classification
● Clustering
Challenges: Represent an object in an N-dimensional space
● Representation of each batch | Object Creation
● Data Cleansing / Creation of training set | Labelling
● “Distance” between batches | Distance Calculation
Inspiration:
● DNA sequence analysis → Markov Models
Machine Learning Introduction
16
Unit of analysis:
batch
17. 1. Data cleansing
Value - Set-Point Flattening, Typos Correction
2. Labelling of batches
Through 2 files:
● Log file: containing manual entries from operators
● Mapping table: containing information for each product
3. Time-series splitting to batches
Depending on business rules which derived from experience and
observations. Keep only production chunks.
Solving the Challenges
17
18. Solving the Challenges
4. Feature selection
Message Number
5. Representation of each batch
6. Unequal length time-series comparison
18
Message Mapping Table
19. Transition Matrix Concept
19
Sequence 1 :
B-B-C-A-B-C-A-C-A-B-C
Sequence 2 :
A-A-B-B-A-B-B-C-C-A-B-B-C-A-B-C-A-C
A B C
A 0.00 0.66 0.33
B 0.00 0.25 0.75
C 1.00 0.00 0.00
A B C
A 0.17 0.67 0.17
B 0.14 0.43 0.43
C 0.75 0.00 0.25
Transition Matrix 1 : Transition Matrix 2 :
20. Solving the Challenges
4. Feature selection
Message Number
5. Representation of each batch
6. Unequal length time-series comparison
Chunk Object, containing:
● Transition Matrix (fixed size 45x45)
● Labels
7. Distance calculation method
Great research area
20
Transition Matrix
Message Mapping Table
22. Distance Evaluation
2. 2D Matrix → Vector: *
A. Append each row to the first
B. Append each row from the diagonal matrix to the first
C. Average of distances between corresponding rows
A.
B.
* Not using Space-Filling curves due to unrelated spatial characteristics.
22
23. Classification (supervised)
The process of classifying objects according
to shared attributes.
Algorithms used:
● Nearest Centroid
● k-Nearest Neighbors
Evaluation Methods:
● Accuracy
● Cohen’s Kappa (Kappa coefficient)
23
train
test
Data
24. Clustering (unsupervised)
The task of grouping objects in such way that objects
in the same group (cluster) are more similar
to each other than to those in other groups.
Algorithms used:
● k-Means
Evaluation Methods:
● V-Measure
● Rand-Index
24
31. Conclusions
1) Visualization
a) Visual Production Overview
b) Enabling Comparison between batches
2) Machine Learning
a) Valid Representation of Categorical Time-Series
b) Quantification of Production Processes
c) Application of Machine Learning Techniques
31
32. Future Work | Academic
● Research on 2D-specific Distance Metrics
● Clustering Algorithms, based on Markov Models
● Classification using Transition Matrices of different Dimensions (Markov-
0,2,...,N)
● Different Feature Selection (temperature, pressure, etc)
32
33. ● Data Gathering Automation
● Creation of Golden Standard for each Product
● Scoring of Production Process
● Distribution of Batches compared to the Average Batch
● Clustering to more efficient clusters based on the process
Future Work | Business
33
39. Distance Comparison | Clustering
39
Setup:
● Algorithm:
○ Baseline
○ k-Means
● Attributes:
○ Product Cleaning Group
○ Product Group
● Initial Centroid Sets Type:
○ All centroids of each set belong to different clusters (Alldiff)
Average of 20 sets
○ All centroids of each set belong to the same cluster (Allsame)
Average of 20 sets
● Distances:
○ All
40. Impact of Initial Centroids
40
Setup:
● Algorithm:
○ Baseline
○ k-Means
● Attributes:
○ Product Cleaning Group
○ Product Group
● Initial Centroid Sets Type:
○ All centroids of each set belong to different clusters (Alldiff)
Average of 100 sets
○ All centroids of each set belong to the same cluster (Allsame)
Average of 100 sets
○ All centroids of each set belong to a random cluster (Allrandom)
Average of 100 sets
● Distances (Average of):
○ Euclidean Total
○ Euclidean Rowl
○ Euclidean Column
41. Determining k in k-Nearest Neighbors [1/2]
41
Accuracy: Average: 0.727 Deviation: <1%
Kappa: Average: 0.531 Deviation: ~2%
42. Determining k in k-Nearest Neighbors [2/2]
42
Accuracy: Average: 0.560 Deviation: <1%
Kappa: Average: 0.391 Deviation: ~1%