SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
Machine Learning Techniques
In Categorical Time Series Analysis
Of Manufacturing Process
Haris Michailidis, Isidora Tourni
National Technical University of Athens
School of Electrical and Computer Engineering
Professor: Nectarios Koziris
J&J Responsible: Michalis Avgoulis
Presentation Date: 26/07/2016
Contents
● Problem Motivation
● Visualization
● Machine Learning
● Results
● Future Work
2
Introduction
3
Problem Motivation
In cooperation with Johnson & Johnson Hellas
Goals:
● Visualisation of Mixing Process
● Quantification of Procedures
● Classification & Clustering of processes
Further Goals:
● Optimization of the Mixing Process
● Comparison with Golden Standard (Evaluation)
● Comparison between different batches of the same Product
4
Process Description
Example Vessel Actions:
● Heating
● Agitation
● Addition of Materials
● Pressure adjustment
PLC logging
● Output to CSV
MixingRaw Materials Bottling
5
Product Categories
6
Emulsion
Product Cleaning Group
Product Categories
7
Picsou C
Product Group
Apple Cream
Data-Set Description
~130.000 rows/year
45 message code sets (values,set-points)
8
Categorical Data
Visualization
9
Visualization Tool
Goals:
● Visualization of Mixing Process
● Selective representation of variables
● Overview with flexible Timeframe
● Accessible from multiple terminals (web interface)
10
The human brain processes visuals60.000 timesfaster than text. *
* Forrester CSO Insights 2012
Visualization Tool (interface)
11
Initial Page of the Visualisation Tool
Visualization Tool (interface)
12
4 days overview
Visualization Tool (interface)
13
1 day overview
Visualization Tool (interface)
14
Detailed box in complex visualization
Machine Learning
15
Goal: Explore the possibilities of Machine Learning in Manufacturing space, in
order to produce useful insights for the process.
● Classification
● Clustering
Challenges: Represent an object in an N-dimensional space
● Representation of each batch | Object Creation
● Data Cleansing / Creation of training set | Labelling
● “Distance” between batches | Distance Calculation
Inspiration:
● DNA sequence analysis → Markov Models
Machine Learning Introduction
16
Unit of analysis:
batch
1. Data cleansing
Value - Set-Point Flattening, Typos Correction
2. Labelling of batches
Through 2 files:
● Log file: containing manual entries from operators
● Mapping table: containing information for each product
3. Time-series splitting to batches
Depending on business rules which derived from experience and
observations. Keep only production chunks.
Solving the Challenges
17
Solving the Challenges
4. Feature selection
Message Number
5. Representation of each batch
6. Unequal length time-series comparison
18
Message Mapping Table
Transition Matrix Concept
19
Sequence 1 :
B-B-C-A-B-C-A-C-A-B-C
Sequence 2 :
A-A-B-B-A-B-B-C-C-A-B-B-C-A-B-C-A-C
A B C
A 0.00 0.66 0.33
B 0.00 0.25 0.75
C 1.00 0.00 0.00
A B C
A 0.17 0.67 0.17
B 0.14 0.43 0.43
C 0.75 0.00 0.25
Transition Matrix 1 : Transition Matrix 2 :
Solving the Challenges
4. Feature selection
Message Number
5. Representation of each batch
6. Unequal length time-series comparison
Chunk Object, containing:
● Transition Matrix (fixed size 45x45)
● Labels
7. Distance calculation method
Great research area
20
Transition Matrix
Message Mapping Table
Distance Evaluation
Goal
● Distance {batch - batch} → Distance between 2D Transition Matrices
Problems:
1. Choosing the proper Vector Distance Metric
2. Converting 2D Transition Matrix → Vector
Solutions:
1. Distance between Vectors:
● Euclidean Distance
● Cosine Distance
● Kullback- Leibler Divergence
● Kolmogorov- Smirnov Test
● Infinite Norm
21
Distance Evaluation
2. 2D Matrix → Vector: *
A. Append each row to the first
B. Append each row from the diagonal matrix to the first
C. Average of distances between corresponding rows
A.
B.
* Not using Space-Filling curves due to unrelated spatial characteristics.
22
Classification (supervised)
The process of classifying objects according
to shared attributes.
Algorithms used:
● Nearest Centroid
● k-Nearest Neighbors
Evaluation Methods:
● Accuracy
● Cohen’s Kappa (Kappa coefficient)
23
train
test
Data
Clustering (unsupervised)
The task of grouping objects in such way that objects
in the same group (cluster) are more similar
to each other than to those in other groups.
Algorithms used:
● k-Means
Evaluation Methods:
● V-Measure
● Rand-Index
24
Classification Results
25
Distance Comparison | Classification
26
Nearest Centroid Classifier
27
Train - Test Split Evaluation [1/2]
Classification Baseline (ZeroR):
Product Cleaning Group Accuracy: 0.520
Product Group Accuracy: 0.377
83%
65%
28
k-Nearest Neighbors Classifier
Train - Test Split Evaluation [2/2]
Classification Baseline (ZeroR):
Product Cleaning Group Accuracy: 0.520
Product Group Accuracy: 0.377
73%
55%
Clustering Results
29
Distance Comparison | Clustering
30
33%
Conclusions
1) Visualization
a) Visual Production Overview
b) Enabling Comparison between batches
2) Machine Learning
a) Valid Representation of Categorical Time-Series
b) Quantification of Production Processes
c) Application of Machine Learning Techniques
31
Future Work | Academic
● Research on 2D-specific Distance Metrics
● Clustering Algorithms, based on Markov Models
● Classification using Transition Matrices of different Dimensions (Markov-
0,2,...,N)
● Different Feature Selection (temperature, pressure, etc)
32
● Data Gathering Automation
● Creation of Golden Standard for each Product
● Scoring of Production Process
● Distribution of Batches compared to the Average Batch
● Clustering to more efficient clusters based on the process
Future Work | Business
33
Thank you!
34
Questions?
Appendix
35
Distance Comparison | Classification
36
Setup:
● Algorithm:
○ Nearest Centroid Classifier
● Attributes:
○ Product Cleaning Group
○ Product Group
● Split:
○ 80% training set, 20% test set
● Distances:
○ All
Determining k in k-Nearest Neighbors
37
Setup:
● Algorithm:
○ k-Nearest Neighbors
● Attributes:
○ Product Cleaning Group
○ Product Group
● Split:
○ 80% training set, 20% test set
● Distances (Average of):
○ Euclidean total
○ Cosine vector
○ KL - Divergence diagonal
Train - Test Split Evaluation
38
Setup:
● Algorithm:
○ Nearest Centroid Classifier
○ k-Nearest Neighbors
● Attributes:
○ Product Cleaning Group
○ Product Group
● Split (train-test):
○ 80% - 20%
○ 65% - 35%
○ 50% - 50%
● Distances (Average of):
○ Euclidean total
○ Cosine vector
○ KL - Divergence diagonal
Distance Comparison | Clustering
39
Setup:
● Algorithm:
○ Baseline
○ k-Means
● Attributes:
○ Product Cleaning Group
○ Product Group
● Initial Centroid Sets Type:
○ All centroids of each set belong to different clusters (Alldiff)
Average of 20 sets
○ All centroids of each set belong to the same cluster (Allsame)
Average of 20 sets
● Distances:
○ All
Impact of Initial Centroids
40
Setup:
● Algorithm:
○ Baseline
○ k-Means
● Attributes:
○ Product Cleaning Group
○ Product Group
● Initial Centroid Sets Type:
○ All centroids of each set belong to different clusters (Alldiff)
Average of 100 sets
○ All centroids of each set belong to the same cluster (Allsame)
Average of 100 sets
○ All centroids of each set belong to a random cluster (Allrandom)
Average of 100 sets
● Distances (Average of):
○ Euclidean Total
○ Euclidean Rowl
○ Euclidean Column
Determining k in k-Nearest Neighbors [1/2]
41
Accuracy: Average: 0.727 Deviation: <1%
Kappa: Average: 0.531 Deviation: ~2%
Determining k in k-Nearest Neighbors [2/2]
42
Accuracy: Average: 0.560 Deviation: <1%
Kappa: Average: 0.391 Deviation: ~1%
Distance Comparison | Classification [2/2]
43
Distance Comparison | Clustering [2/2]
44
Impact of Initial Centroids [2/2]
45
Labelling
1. Data cleansing
2. Labelling of batches
Object Creation
3. Time-series splitting to batches
4. Representation of each batch (chunk)
5. Feature selection
6. Unequal length time-series comparison
Distance Calculation
7. Distance calculation method
Challenges in ML
46
Impact of Initial Centroids
47

Weitere ähnliche Inhalte

Andere mochten auch

Manufacturing Execution System
Manufacturing Execution SystemManufacturing Execution System
Manufacturing Execution SystemAnand Subramaniam
 
How to become a data scientist in 6 months
How to become a data scientist in 6 monthsHow to become a data scientist in 6 months
How to become a data scientist in 6 monthsTetiana Ivanova
 
Turning Learning into Numbers - A Learning Analytics Framework
Turning Learning into Numbers - A Learning Analytics FrameworkTurning Learning into Numbers - A Learning Analytics Framework
Turning Learning into Numbers - A Learning Analytics FrameworkHendrik Drachsler
 
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchJeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchAI Frontiers
 
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...Andrew Gardner
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowAI Frontiers
 
Intelligent Chatbot on WeChat
Intelligent Chatbot on WeChatIntelligent Chatbot on WeChat
Intelligent Chatbot on WeChatAI Frontiers
 
Transform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine LearningTransform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine LearningSri Ambati
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataMarko Rodriguez
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)Prof. Dr. Diego Kuonen
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 

Andere mochten auch (13)

Manufacturing Execution System
Manufacturing Execution SystemManufacturing Execution System
Manufacturing Execution System
 
How to become a data scientist in 6 months
How to become a data scientist in 6 monthsHow to become a data scientist in 6 months
How to become a data scientist in 6 months
 
Turning Learning into Numbers - A Learning Analytics Framework
Turning Learning into Numbers - A Learning Analytics FrameworkTurning Learning into Numbers - A Learning Analytics Framework
Turning Learning into Numbers - A Learning Analytics Framework
 
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchJeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
 
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
 
Intelligent Chatbot on WeChat
Intelligent Chatbot on WeChatIntelligent Chatbot on WeChat
Intelligent Chatbot on WeChat
 
Transform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine LearningTransform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine Learning
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 

Ähnlich wie Machine Learning Techniques in Categorical Time Series

8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...LDBC council
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfasdfasdf214078
 
20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptx20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptxMinh Nguyen
 
Aplication of on line data analytics to a continuous process polybetene unit
Aplication of on line data analytics to a continuous process polybetene unitAplication of on line data analytics to a continuous process polybetene unit
Aplication of on line data analytics to a continuous process polybetene unitEmerson Exchange
 
ES2022-Minh-Nguyen-ShapingTestsIntoModelsForAutomatedTCGeneration.pdf
ES2022-Minh-Nguyen-ShapingTestsIntoModelsForAutomatedTCGeneration.pdfES2022-Minh-Nguyen-ShapingTestsIntoModelsForAutomatedTCGeneration.pdf
ES2022-Minh-Nguyen-ShapingTestsIntoModelsForAutomatedTCGeneration.pdfMinh Nguyen
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictionsAnton Kulesh
 
Mb0048 operations research
Mb0048  operations researchMb0048  operations research
Mb0048 operations researchsmumbahelp
 
Measuring Quality_Testing&Trends_Final _May 5
Measuring Quality_Testing&Trends_Final _May 5Measuring Quality_Testing&Trends_Final _May 5
Measuring Quality_Testing&Trends_Final _May 5Liana Gevorgyan
 
Sequences classification based on group technology
Sequences classification based on group technologySequences classification based on group technology
Sequences classification based on group technologyeSAT Publishing House
 
Mb0048 operations research
Mb0048  operations researchMb0048  operations research
Mb0048 operations researchsmumbahelp
 
feagoodpracticepresentation-191125101221.pdf
feagoodpracticepresentation-191125101221.pdffeagoodpracticepresentation-191125101221.pdf
feagoodpracticepresentation-191125101221.pdfJethallalGada
 
FEA good practices presentation
FEA good practices presentationFEA good practices presentation
FEA good practices presentationMahdi Damghani
 
Product defect detection based on convolutional autoencoder and one-class cla...
Product defect detection based on convolutional autoencoder and one-class cla...Product defect detection based on convolutional autoencoder and one-class cla...
Product defect detection based on convolutional autoencoder and one-class cla...IAESIJAI
 
Contech analyser for_robust_design_v1.6_en
Contech analyser for_robust_design_v1.6_enContech analyser for_robust_design_v1.6_en
Contech analyser for_robust_design_v1.6_enClaudia Herrmann
 

Ähnlich wie Machine Learning Techniques in Categorical Time Series (20)

8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
 
Qbd continued
Qbd continuedQbd continued
Qbd continued
 
20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptx20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptx
 
Aplication of on line data analytics to a continuous process polybetene unit
Aplication of on line data analytics to a continuous process polybetene unitAplication of on line data analytics to a continuous process polybetene unit
Aplication of on line data analytics to a continuous process polybetene unit
 
ES2022-Minh-Nguyen-ShapingTestsIntoModelsForAutomatedTCGeneration.pdf
ES2022-Minh-Nguyen-ShapingTestsIntoModelsForAutomatedTCGeneration.pdfES2022-Minh-Nguyen-ShapingTestsIntoModelsForAutomatedTCGeneration.pdf
ES2022-Minh-Nguyen-ShapingTestsIntoModelsForAutomatedTCGeneration.pdf
 
What is C3D?
What is C3D?What is C3D?
What is C3D?
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
Credit risk meetup
Credit risk meetupCredit risk meetup
Credit risk meetup
 
1b7 quality control
1b7 quality control1b7 quality control
1b7 quality control
 
Data envelopment analysis
Data envelopment analysisData envelopment analysis
Data envelopment analysis
 
Maestro_Abstract
Maestro_AbstractMaestro_Abstract
Maestro_Abstract
 
Mb0048 operations research
Mb0048  operations researchMb0048  operations research
Mb0048 operations research
 
Measuring Quality_Testing&Trends_Final _May 5
Measuring Quality_Testing&Trends_Final _May 5Measuring Quality_Testing&Trends_Final _May 5
Measuring Quality_Testing&Trends_Final _May 5
 
Sequences classification based on group technology
Sequences classification based on group technologySequences classification based on group technology
Sequences classification based on group technology
 
Mb0048 operations research
Mb0048  operations researchMb0048  operations research
Mb0048 operations research
 
feagoodpracticepresentation-191125101221.pdf
feagoodpracticepresentation-191125101221.pdffeagoodpracticepresentation-191125101221.pdf
feagoodpracticepresentation-191125101221.pdf
 
FEA good practices presentation
FEA good practices presentationFEA good practices presentation
FEA good practices presentation
 
Product defect detection based on convolutional autoencoder and one-class cla...
Product defect detection based on convolutional autoencoder and one-class cla...Product defect detection based on convolutional autoencoder and one-class cla...
Product defect detection based on convolutional autoencoder and one-class cla...
 
Contech analyser for_robust_design_v1.6_en
Contech analyser for_robust_design_v1.6_enContech analyser for_robust_design_v1.6_en
Contech analyser for_robust_design_v1.6_en
 

Machine Learning Techniques in Categorical Time Series

  • 1. Machine Learning Techniques In Categorical Time Series Analysis Of Manufacturing Process Haris Michailidis, Isidora Tourni National Technical University of Athens School of Electrical and Computer Engineering Professor: Nectarios Koziris J&J Responsible: Michalis Avgoulis Presentation Date: 26/07/2016
  • 2. Contents ● Problem Motivation ● Visualization ● Machine Learning ● Results ● Future Work 2
  • 4. Problem Motivation In cooperation with Johnson & Johnson Hellas Goals: ● Visualisation of Mixing Process ● Quantification of Procedures ● Classification & Clustering of processes Further Goals: ● Optimization of the Mixing Process ● Comparison with Golden Standard (Evaluation) ● Comparison between different batches of the same Product 4
  • 5. Process Description Example Vessel Actions: ● Heating ● Agitation ● Addition of Materials ● Pressure adjustment PLC logging ● Output to CSV MixingRaw Materials Bottling 5
  • 8. Data-Set Description ~130.000 rows/year 45 message code sets (values,set-points) 8 Categorical Data
  • 10. Visualization Tool Goals: ● Visualization of Mixing Process ● Selective representation of variables ● Overview with flexible Timeframe ● Accessible from multiple terminals (web interface) 10 The human brain processes visuals60.000 timesfaster than text. * * Forrester CSO Insights 2012
  • 11. Visualization Tool (interface) 11 Initial Page of the Visualisation Tool
  • 14. Visualization Tool (interface) 14 Detailed box in complex visualization
  • 16. Goal: Explore the possibilities of Machine Learning in Manufacturing space, in order to produce useful insights for the process. ● Classification ● Clustering Challenges: Represent an object in an N-dimensional space ● Representation of each batch | Object Creation ● Data Cleansing / Creation of training set | Labelling ● “Distance” between batches | Distance Calculation Inspiration: ● DNA sequence analysis → Markov Models Machine Learning Introduction 16 Unit of analysis: batch
  • 17. 1. Data cleansing Value - Set-Point Flattening, Typos Correction 2. Labelling of batches Through 2 files: ● Log file: containing manual entries from operators ● Mapping table: containing information for each product 3. Time-series splitting to batches Depending on business rules which derived from experience and observations. Keep only production chunks. Solving the Challenges 17
  • 18. Solving the Challenges 4. Feature selection Message Number 5. Representation of each batch 6. Unequal length time-series comparison 18 Message Mapping Table
  • 19. Transition Matrix Concept 19 Sequence 1 : B-B-C-A-B-C-A-C-A-B-C Sequence 2 : A-A-B-B-A-B-B-C-C-A-B-B-C-A-B-C-A-C A B C A 0.00 0.66 0.33 B 0.00 0.25 0.75 C 1.00 0.00 0.00 A B C A 0.17 0.67 0.17 B 0.14 0.43 0.43 C 0.75 0.00 0.25 Transition Matrix 1 : Transition Matrix 2 :
  • 20. Solving the Challenges 4. Feature selection Message Number 5. Representation of each batch 6. Unequal length time-series comparison Chunk Object, containing: ● Transition Matrix (fixed size 45x45) ● Labels 7. Distance calculation method Great research area 20 Transition Matrix Message Mapping Table
  • 21. Distance Evaluation Goal ● Distance {batch - batch} → Distance between 2D Transition Matrices Problems: 1. Choosing the proper Vector Distance Metric 2. Converting 2D Transition Matrix → Vector Solutions: 1. Distance between Vectors: ● Euclidean Distance ● Cosine Distance ● Kullback- Leibler Divergence ● Kolmogorov- Smirnov Test ● Infinite Norm 21
  • 22. Distance Evaluation 2. 2D Matrix → Vector: * A. Append each row to the first B. Append each row from the diagonal matrix to the first C. Average of distances between corresponding rows A. B. * Not using Space-Filling curves due to unrelated spatial characteristics. 22
  • 23. Classification (supervised) The process of classifying objects according to shared attributes. Algorithms used: ● Nearest Centroid ● k-Nearest Neighbors Evaluation Methods: ● Accuracy ● Cohen’s Kappa (Kappa coefficient) 23 train test Data
  • 24. Clustering (unsupervised) The task of grouping objects in such way that objects in the same group (cluster) are more similar to each other than to those in other groups. Algorithms used: ● k-Means Evaluation Methods: ● V-Measure ● Rand-Index 24
  • 26. Distance Comparison | Classification 26
  • 27. Nearest Centroid Classifier 27 Train - Test Split Evaluation [1/2] Classification Baseline (ZeroR): Product Cleaning Group Accuracy: 0.520 Product Group Accuracy: 0.377 83% 65%
  • 28. 28 k-Nearest Neighbors Classifier Train - Test Split Evaluation [2/2] Classification Baseline (ZeroR): Product Cleaning Group Accuracy: 0.520 Product Group Accuracy: 0.377 73% 55%
  • 30. Distance Comparison | Clustering 30 33%
  • 31. Conclusions 1) Visualization a) Visual Production Overview b) Enabling Comparison between batches 2) Machine Learning a) Valid Representation of Categorical Time-Series b) Quantification of Production Processes c) Application of Machine Learning Techniques 31
  • 32. Future Work | Academic ● Research on 2D-specific Distance Metrics ● Clustering Algorithms, based on Markov Models ● Classification using Transition Matrices of different Dimensions (Markov- 0,2,...,N) ● Different Feature Selection (temperature, pressure, etc) 32
  • 33. ● Data Gathering Automation ● Creation of Golden Standard for each Product ● Scoring of Production Process ● Distribution of Batches compared to the Average Batch ● Clustering to more efficient clusters based on the process Future Work | Business 33
  • 36. Distance Comparison | Classification 36 Setup: ● Algorithm: ○ Nearest Centroid Classifier ● Attributes: ○ Product Cleaning Group ○ Product Group ● Split: ○ 80% training set, 20% test set ● Distances: ○ All
  • 37. Determining k in k-Nearest Neighbors 37 Setup: ● Algorithm: ○ k-Nearest Neighbors ● Attributes: ○ Product Cleaning Group ○ Product Group ● Split: ○ 80% training set, 20% test set ● Distances (Average of): ○ Euclidean total ○ Cosine vector ○ KL - Divergence diagonal
  • 38. Train - Test Split Evaluation 38 Setup: ● Algorithm: ○ Nearest Centroid Classifier ○ k-Nearest Neighbors ● Attributes: ○ Product Cleaning Group ○ Product Group ● Split (train-test): ○ 80% - 20% ○ 65% - 35% ○ 50% - 50% ● Distances (Average of): ○ Euclidean total ○ Cosine vector ○ KL - Divergence diagonal
  • 39. Distance Comparison | Clustering 39 Setup: ● Algorithm: ○ Baseline ○ k-Means ● Attributes: ○ Product Cleaning Group ○ Product Group ● Initial Centroid Sets Type: ○ All centroids of each set belong to different clusters (Alldiff) Average of 20 sets ○ All centroids of each set belong to the same cluster (Allsame) Average of 20 sets ● Distances: ○ All
  • 40. Impact of Initial Centroids 40 Setup: ● Algorithm: ○ Baseline ○ k-Means ● Attributes: ○ Product Cleaning Group ○ Product Group ● Initial Centroid Sets Type: ○ All centroids of each set belong to different clusters (Alldiff) Average of 100 sets ○ All centroids of each set belong to the same cluster (Allsame) Average of 100 sets ○ All centroids of each set belong to a random cluster (Allrandom) Average of 100 sets ● Distances (Average of): ○ Euclidean Total ○ Euclidean Rowl ○ Euclidean Column
  • 41. Determining k in k-Nearest Neighbors [1/2] 41 Accuracy: Average: 0.727 Deviation: <1% Kappa: Average: 0.531 Deviation: ~2%
  • 42. Determining k in k-Nearest Neighbors [2/2] 42 Accuracy: Average: 0.560 Deviation: <1% Kappa: Average: 0.391 Deviation: ~1%
  • 43. Distance Comparison | Classification [2/2] 43
  • 44. Distance Comparison | Clustering [2/2] 44
  • 45. Impact of Initial Centroids [2/2] 45
  • 46. Labelling 1. Data cleansing 2. Labelling of batches Object Creation 3. Time-series splitting to batches 4. Representation of each batch (chunk) 5. Feature selection 6. Unequal length time-series comparison Distance Calculation 7. Distance calculation method Challenges in ML 46
  • 47. Impact of Initial Centroids 47