Bridging The Gap Between Data Knowledge

Bridging the gap between data and knowledge
Bridging the gap between data and knowledge
with The Unscrambler X

Discover how data mining can benefit you.
Discover how data mining can benefit you.

Marion Cuny
CAMO Software AS
CAMO Software AS
www.camo.com

2
Content

1. Improve your work time efficiency
2. Combine data from many sources for enhanced
understanding of complex systems
3. Understand the structure of your data and locate the root
cause of process/product deviations
4. Design more efficient processes and products
5. Predict quality at an early stage and classify raw
material/batch attributes
6. Conclusions
6 C l i

www.camo.com

Improve your work time efficiency
Improve your work time efficiency

www.camo.com

4
Organized and annotated projects
and audit trail

Project Navigator
Know the project progression by
looking at the:
looking at the:
• Project organization,
• Audit trail and
• Information and notes displayed for
Information and notes displayed for
Info and Notes Boxes each object.
www.camo.com

5
Preview the results of your pretreatment

Save time in optimizing the
Save time in optimizing the
parameters of your pretreatments
before performing them.
before performing them.

www.camo.com

6
Conclusion

• Organized data save you a lot of time!
What did I/my colleague do last month with this
dataset?
What was the plot that was showing the results?

• Preview of results: don’t do things that don’t give
don t don t
good results.

www.camo.com

Combine data from many sources for enhanced
Combine data from many sources for enhanced
understanding of complex systems

www.camo.com

8
Import data for various sources

Unscrambler matrices
U bl ti
ASCII Text
Excel. Also possible to use copy‐paste
and drag and drop
Matlab
Spectral formats
Database (Oracle, SQL,..)
D b (O l SQL )

www.camo.com

9
Our Instrument Partners

www.camo.com

10
System Integration Partners

• Integration for online monitoring and control:
– Siemens SiPAT
– Optimal SynTQ
– Symbion
y
– ABB XPAT & FTSW integration
– GE Fanuc
GE Fanuc

www.camo.com

11
OPC import menu

www.camo.com

12
Imported data

www.camo.com

13
Combine them in the analysis

• X and Y matrices can be in separated datasets
p
• Aggregate matrices

www.camo.com

14
Conclusion

• See relationships and create models between
any kind of data:
y
– Different type
– Different stages of the p
g process
and get a clear understanding of what is going on.

www.camo.com

Understand the structure of your data and locate the
y
root cause of process/product deviations

www.camo.com

Fundamentals of Multivariate Statistical Process Control

• Th Ellipse i k
The Elli is known
as Hotellings T2
Ellipse and represents
a 95% confidence
region.
• There are regions
in the multivariate
Variable 2

control chart that
are forbidden in
the i i t
th univariate
charts.
• There are also
regions in the
univariate sense
that are out of
Variable
V i bl 1 control in a
multivariate sense
www.camo.com

17
Design Space: As defined by ICH Q8

The multidimensional combination and interaction of input
p
variables and process parameters that have been demonstrated to
provide assurance of quality
Design Space

Desired State

Undesired State

www.camo.com

18
NIR Spectroscopy for monitoring the
granulation process

• Acquire NIR spectra during the process
• Goal: Understand batch behavior, and follow process
trajectories with PCA

High Shear Granulator (Glatt
g S ea a ua o ( a
TMG) with diffuse reflectance
probe and NIR spectrometer
collecting spectra at 2 second
collecting spectra at 2 second
interval

www.camo.com

19
High Shear Wet Granulation

• Granulation process is important to:
• increase particle size
• enhance compressibility
• improve hydrophilicity
• improve product h
i d t homogeneity
it
• The process has three stages:
• Dry mix phase - lactose & starch ( minutes)
(2 )
• Liquid addition phase – PVP and water (1-2 minutes)
• Granulation (3-5 minutes)

www.camo.com

20
Granulation batches studied

• Diffuse reflection NIR spectra collected at 2-3 second
intervals for 15 batches, giving 130-180 spectra per batch
• Each spectrum 1100-2200 nm (1101 variables)
• First three batches run at target conditions
– Some process changes in terms of addition rates,
impeller speeds, granulation time in other batches
• PCA model to find patterns and groupings, and model the
granulation process

www.camo.com

21
First derivative NIR spectra of HSG process

Color coded to highlight the stages of the process:
Mixing of lactose & starch
Liquid Addition – water & PVP
Granulation

OH peaks increase on addition

Change in CH bands due to binders

www.camo.com

22
PCA analysis: line plot of PC score 1
Batches 4 & 5 differ: no PVP was added during the liquid
addition phase
dditi h
Batch 6: target conditions with longer granulation time

www.camo.com

23
PCA score plots of 3 batches run under
target conditions
Granulation – end point
Dry mixing phase

Liquid addition phase

www.camo.com

24
Granulation trajectory from 3-D Scores plot

Granulation ‐ end

Dry mix
Dry mix

Liquid addition

www.camo.com

25
Conclusion

• The structure of a data set is revealed by PCA.
• Note: sometime you need pre-treatment to reveal
pre treatment
the structure accurately.

www.camo.com

Design more efficient processes and products
Design more efficient processes and products

www.camo.com

27
Principle of DoE

• Perform the least number of experiments to
cover the design space in an efficient way.
X2 X2
max max

min min

min max min max
X1 X1

www.camo.com

28
Why do we use DoE compared to the
“scientific approach”?
scientific approach ?
• One variable at a time approach:
pp
In order to establish a relationship between cause and effect,
each cause must be investigated separately, all other
conditions being fixed.
• The limit of the one variable at a time approach:
X2 X2 Actual optimum

X1 X1

www.camo.com

29
The logical approach

Set the goal of the experimentation (model type)
Select the variables to include in the design
Select the response variables
Select the appropriate design

X Y
Ex: Maximize the Ex: Cooking time, Ex: Stability BBD, Ex: CCD
quality of our cookies: temperature, chocolate preference, cost
Quadratic model content

www.camo.com

30
Start tab

www.camo.com

31
Define variables tab

All the variables are defined in the same table.
Easy definition thanks to the tick box menu and radio buttons.
Easy definition thanks to the tick box menu and radio buttons

www.camo.com

32
Choose the design tab

Auto‐selection of the best suiting design

Designs stated as actions

Information on the selected design

www.camo.com

33
Design details

Select the resolution of the design depending on your goal and the number of
experiment to run.
www.camo.com

34
Additional experiments

www.camo.com

35
Randomization

www.camo.com

36
Summary

The calculation of the power for the two
response variables shows that to detect a
difference of 0.6 for the preference this
design is not appropriate as the power is
d h
below 0.8.

We can look for the LSD that can be found.
W l k f th LSD th t b f d

www.camo.com

37
Tables in X

www.camo.com

38
Analysis

www.camo.com

39
Results: Effect summary

www.camo.com

40
Results: Diagnostics

Probable curvature effect

www.camo.com

41
Results: Residuals

Or maybe a bias at
the end of
experimentation.

www.camo.com

42
Extension of the design

www.camo.com

43
Extension of the design

www.camo.com

44
Results: Response surface

www.camo.com

45
Conclusion

• DoE helps you to:
– Create
– Improve
a process or product
product.

www.camo.com

Predict quality at an early stage and
Predict quality at an early stage and
classify raw material/batch attributes

www.camo.com

47
Visualizing groups

• PCA score plot
• Clustering

Make a model to predict the group:
Make a model to predict the group
SIMCA, PLSDA, SVM and LDA

www.camo.com

48
SIMCA Classification

• Soft Independent modeling of Class Analogies:
p g g
– Make a PCA model for each class;
– Project new samples onto the model.
j p

Maximum
Center
Center distance to the
distance to the
of model (Si)
PC2 model
Samples from Maximum
g p
group A PC1 group A
g p leverage for the
leverage for the
Samples from model (Hi)
group B
PC1 group B
PC1
Samples from
group C PC1 group C

www.camo.com

49
SIMCA Classification

• Soft Independent modeling of Class Analogies:
p g g
– Make a PCA model for each class;
– Project new samples onto the model.
j p

PC2
Samples from
group A
group A PC1 group A
PC1 group A
Samples from
group B
PC1 group B
PC1
Samples from
group C PC1 group C

www.camo.com

50
Example dataset

NIR data of:
• 83 samples: 67 calibration and 16 test
• 2600 variables
• 5 groups but only 4 for creating the models

www.camo.com

51
Overview PCA scores plot of training
samples from 4 classes

www.camo.com

52
Classification

• PCA model on independent classes

www.camo.com

53
Classification of the new samples

All the foreign samples are
All th f i l
rejected by all models.
MCC samples not
recognized by its model.
recognized by its model

www.camo.com

54
The MCC sample is detected as outlier as its
leverage is too important

www.camo.com

55
PLS Discriminant Analysis

• Each class is represented by a 0 / 1 variable:
– Build a regression model with those variables as
responses (
p (PLS1 for 1 or 2 classes, else PLS2);
, );
– Make predictions for new samples:
close to 1 means “member”, close to 0 “non member”.
A B C
Samples from 1 0 0 Predicted Predicted Predicted
group A 1 0 0
1 1 1
Samples from 0 1 0
group B 0 1 0
0 1 0
0 0 0
Samples from 0 0 1
group C 0 0 1 0 1 Measured 0 1 Measured 0 1 Measured
0 0 1 Model B
Model A Model C

Classification

www.camo.com

56
Example data set

Spectra
p

Category variables:
2 values: 0 & 1

www.camo.com

57
Good models for all groups

www.camo.com

58
Prediction

www.camo.com

Prediction on the AciDiSol model

A lot of uncertainty on the foreign samples.

www.camo.com

60
Prediction on the MCC model

A lot of uncertainty on the foreign samples.

MCC is well classified

www.camo.com

61
Inlier vs Hotelling T2

MCC20 is an inlier

www.camo.com

62
Conclusions

• MVA can be used for classification /
characterization as well as quantification
q
purposes
• Samples are in a group or not or getting a
specific predicted value and you get diagnostic
tools to understand the results
• Diagnostics made at an early stage enable you
to correct for deviation and decrease the cost of
waste/reproduce.

www.camo.com

Conclusions

www.camo.com

64
Objectives and Tools

Objective
j The Unscrambler X
• Process Understanding • Design of Experiments (DoE)
• Identification and understanding of • Statistical Hypothesis Tests
raw materials • Exploratory Data Analysis
p y y
• Product and Process Development • Regression modelling
• Root Cause Analysis • Classification
• Prediction of Quality • Prediction

Define Design Analyze Implement Improve

www.camo.com

65
General Conclusions

• Multivariate analysis:
– gives y a g
g you global
picture.
– is an understanding
tool.
– is an improving tool.

www.camo.com

66
Benefits

• Multivariate analysis in The Unscrambler X benefits:
– Team work (project architecture, notes, info)
(p j , , )
– Reporting work (informative plots, report generator)

www.camo.com

67
Archived webinars
www.camo.com/training/webinars‐seminar.html

www.camo.com

68
Global Presence
Head office :
Oslo, Norway
Oslo Norway Sales Office:
Sales Office:
Japan

Sales Office:
Sales Office:
Sydney, AU

Sales Office:
Woodbridge,
NJ
R&D:
Bangalore, India
Resellers / Distributors
www.camo.com

69
Questions

Marion C n marion@camo no
Cuny: marion@camo.no

www.camo.com

Bridging The Gap Between Data Knowledge

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Bridging The Gap Between Data Knowledge

Ähnlich wie Bridging The Gap Between Data Knowledge (20)

Bridging The Gap Between Data Knowledge