SlideShare ist ein Scribd-Unternehmen logo
1 von 69
Bridging the gap between data and knowledge 
Bridging the gap between data and knowledge
            with The Unscrambler X


    Discover how data mining can benefit you.
    Discover how data mining can benefit you.


                  Marion Cuny
                CAMO Software AS
                CAMO Software AS
                                    www.camo.com
2
                        Content


1. Improve your work time efficiency
2. Combine data from many sources for enhanced
   understanding of complex systems
3. Understand the structure of your data and locate the root
   cause of process/product deviations
4. Design more efficient processes and products
5. Predict quality at an early stage and classify raw
   material/batch attributes
6. Conclusions
6 C     l i



                                             www.camo.com
Improve your work time efficiency
Improve your work time efficiency




                          www.camo.com
4
       Organized and annotated projects
                and audit trail




Project Navigator
                         Know the project progression by 
                         looking at the:
                         looking at the:
                         • Project organization, 
                         • Audit trail and 
                         • Information and notes displayed for
                           Information and notes displayed for 
  Info and Notes Boxes   each object.
                                                 www.camo.com
5
Preview the results of your pretreatment




                   Save time in optimizing the 
                   Save time in optimizing the
                   parameters of your pretreatments 
                   before performing them.
                   before performing them.




                                   www.camo.com
6
                     Conclusion


• Organized data save you a lot of time!
      What did I/my colleague do last month with this
  dataset?
      What was the plot that was showing the results?


• Preview of results: don’t do things that don’t give
                      don t                don t
  good results.



                                           www.camo.com
Combine data from many sources for enhanced 
Combine data from many sources for enhanced
     understanding of complex systems




                                www.camo.com
8
Import data for various sources



                Unscrambler matrices
                U      bl     ti
                ASCII Text
                Excel. Also possible to use copy‐paste 
                and drag and drop
                Matlab
                Spectral formats
                Database (Oracle, SQL,..)
                D b      (O l SQL )




                                   www.camo.com
9
Our Instrument Partners




                      www.camo.com
10
            System Integration Partners


• Integration for online monitoring and control:
   –   Siemens SiPAT
   –   Optimal SynTQ
   –   Symbion
        y
   –   ABB XPAT & FTSW integration
   –   GE Fanuc
       GE Fanuc




                                          www.camo.com
11
OPC import menu




                  www.camo.com
12
Imported data




                www.camo.com
13
         Combine them in the analysis

• X and Y matrices can be in separated datasets
                               p
• Aggregate matrices




                                           www.camo.com
14
                     Conclusion


• See relationships and create models between
  any kind of data:
    y
  – Different type
  – Different stages of the p
                 g          process
and get a clear understanding of what is going on.




                                       www.camo.com
Understand the structure of your data and locate the 
                            y
     root cause of process/product deviations




                                      www.camo.com
Fundamentals of Multivariate Statistical Process Control


                                      • Th Ellipse i k
                                        The Elli    is known
                                        as Hotellings T2
                                        Ellipse and represents
                                        a 95% confidence
                                        region.
                                      • There are regions
                                        in the multivariate
 Variable 2




                                        control chart that
                                        are forbidden in
                                        the i i t
                                        th univariate
                                        charts.
                                      • There are also
                                        regions in the
                                        univariate sense
                                        that are out of
               Variable
               V i bl 1                 control in a
                                        multivariate sense
                                          www.camo.com
17
     Design Space: As defined by ICH Q8

The multidimensional combination and interaction of input 
                                                      p
variables and process parameters that have been demonstrated to 
provide assurance of quality
      Design Space

                                               Desired State




                                           Undesired State


                                               www.camo.com
18
     NIR Spectroscopy for monitoring the
            granulation process

• Acquire NIR spectra during the process
• Goal: Understand batch behavior, and follow process
  trajectories with PCA

                                High Shear Granulator (Glatt 
                                  g S ea      a ua o ( a
                                TMG) with diffuse reflectance 
                                probe and NIR spectrometer 
                                collecting spectra at 2 second 
                                collecting spectra at 2 second
                                interval




                                                www.camo.com
19
            High Shear Wet Granulation


• Granulation process is important to:
  •   increase particle size
  •   enhance compressibility
  •   improve hydrophilicity
  •   improve product h
      i          d t homogeneity
                              it
• The process has three stages:
  •   Dry mix phase - lactose & starch ( minutes)
                                       (2        )
  •   Liquid addition phase – PVP and water (1-2 minutes)
  •   Granulation (3-5 minutes)




                                                    www.camo.com
20
           Granulation batches studied


• Diffuse reflection NIR spectra collected at 2-3 second
  intervals for 15 batches, giving 130-180 spectra per batch
• Each spectrum 1100-2200 nm (1101 variables)
• First three batches run at target conditions
   – Some process changes in terms of addition rates,
     impeller speeds, granulation time in other batches
• PCA model to find patterns and groupings, and model the
  granulation process




                                             www.camo.com
21
First derivative NIR spectra of HSG process

 Color coded to highlight the stages of the process:
 Mixing of lactose & starch
 Liquid Addition – water & PVP
 Granulation

                     OH peaks increase on addition



                                     Change in CH bands due to binders




                                                             www.camo.com
22
PCA analysis: line plot of PC score 1
 Batches 4 & 5 differ: no PVP was added during the liquid 
 addition phase
  dditi     h
 Batch 6: target conditions with longer granulation time




                                                       www.camo.com
23
PCA score plots of 3 batches run under
          target conditions
                                           Granulation – end point
Dry mixing phase




                   Liquid addition phase



                                                    www.camo.com
24
Granulation trajectory from 3-D Scores plot



             Granulation ‐ end




Dry mix
Dry mix



                         Liquid addition



                                           www.camo.com
25
                  Conclusion


• The structure of a data set is revealed by PCA.
• Note: sometime you need pre-treatment to reveal
                             pre treatment
  the structure accurately.




                                     www.camo.com
Design more efficient processes and products
Design more efficient processes and products




                                www.camo.com
27
                  Principle of DoE

  • Perform the least number of experiments to
    cover the design space in an efficient way.
X2                            X2
max                           max




min                           min

      min               max         min                  max
                        X1                               X1


                                          www.camo.com
28
        Why do we use DoE compared to the
              “scientific approach”?
               scientific approach ?
• One variable at a time approach:
                          pp
     In order to establish a relationship between cause and effect,
     each cause must be investigated separately, all other
     conditions being fixed.
• The limit of the one variable at a time approach:
X2                                   X2                     Actual optimum




                          X1                                       X1

                                                    www.camo.com
29
                          The logical approach

         Set the goal of the experimentation (model type)
         Select the variables to include in the design
         Select the response variables
         Select the appropriate design




                             X                             Y
  Ex: Maximize the        Ex: Cooking time,        Ex: Stability BBD,      Ex: CCD
quality of our cookies:   temperature, chocolate   preference, cost
  Quadratic model         content

                                                            www.camo.com
30
Start tab




            www.camo.com
31
                     Define variables tab




All the variables are defined in the same table.
Easy definition thanks to the tick box menu and radio buttons.
Easy definition thanks to the tick box menu and radio buttons




                                                                 www.camo.com
32
Choose the design tab



      Auto‐selection of the best suiting design

      Designs stated as actions


      Information on the selected design




                                    www.camo.com
33
                             Design details




Select the resolution of the design depending on your goal and the number of 
experiment to run.
                                                                www.camo.com
34
Additional experiments




                         www.camo.com
35
Randomization




                www.camo.com
36
                              Summary




The calculation of the power for the two 
response variables shows that to detect a 
difference of 0.6 for the preference this 
design is not appropriate as the power is 
d                              h
below 0.8.

We can look for the LSD that can be found.
W      l k f th LSD th t         b f    d




                                             www.camo.com
37
Tables in X




              www.camo.com
38
Analysis




           www.camo.com
39
Results: Effect summary




                      www.camo.com
40
       Results: Diagnostics




Probable curvature effect


                              www.camo.com
41
Results: Residuals




            Or maybe a bias at 
            the end of 
            experimentation.




                           www.camo.com
42
Extension of the design




                      www.camo.com
43
Extension of the design




                      www.camo.com
44
Results: Response surface




                       www.camo.com
45
                   Conclusion


• DoE helps you to:
  – Create
  – Improve
a process or product
             product.




                                www.camo.com
Predict quality at an early stage and 
Predict quality at an early stage and
classify raw material/batch attributes




                             www.camo.com
47
              Visualizing groups


• PCA score plot
• Clustering




    Make a model to predict the group: 
    Make a model to predict the group
       SIMCA, PLSDA, SVM and LDA



                                          www.camo.com
48
                    SIMCA Classification

  • Soft Independent modeling of Class Analogies:
             p              g               g
        – Make a PCA model for each class;
        – Project new samples onto the model.
             j           p

                                                               Maximum 
                                           Center 
                                           Center              distance to the 
                                                               distance to the
                                           of                  model (Si)
                    PC2                    model
Samples from                                                        Maximum 
  g p
  group A                             PC1 group A
                                          g p                       leverage for the 
                                                                    leverage for the
Samples from                                                        model (Hi)
  group B
                                                PC1 group B
                           PC1
Samples from 
  group C                                                    PC1 group C



                                                     www.camo.com
49
                  SIMCA Classification

• Soft Independent modeling of Class Analogies:
           p              g               g
      – Make a PCA model for each class;
      – Project new samples onto the model.
           j           p



                   PC2
Samples from 
  group A
  group A                          PC1 group A
                                   PC1 group A
Samples from 
  group B
                                                     PC1 group B
                          PC1
Samples from 
  group C                                                  PC1 group C


                                                 www.camo.com
50
                  Example dataset




NIR data of:
• 83 samples: 67 calibration and 16 test
• 2600 variables
• 5 groups but only 4 for creating the models


                                                www.camo.com
51
Overview PCA scores plot of training
      samples from 4 classes




                            www.camo.com
52
                  Classification


• PCA model on independent classes




                                     www.camo.com
53
Classification of the new samples



                        All the foreign samples are 
                        All th f i          l
                        rejected by all models.
                        MCC samples not 
                        recognized by its model.
                        recognized by its model




                               www.camo.com
54
The MCC sample is detected as outlier as its
        leverage is too important




                                 www.camo.com
55
                   PLS Discriminant Analysis

 • Each class is represented by a 0 / 1 variable:
       – Build a regression model with those variables as
         responses (
            p       (PLS1 for 1 or 2 classes, else PLS2);
                                            ,           );
       – Make predictions for new samples:
         close to 1 means “member”, close to 0 “non member”.
                A B C
Samples from    1 0 0       Predicted                          Predicted                         Predicted
  group A       1 0 0
                             1                                 1                                 1
Samples from    0 1 0
  group B       0 1 0
                0 1 0
                             0                                 0                                 0
Samples from    0 0 1
  group C       0 0 1            0                1 Measured       0                1 Measured       0                1 Measured
                0 0 1                                              Model B
                                   Model A                                                           Model C

                         Classification

                                                                                         www.camo.com
56
     Example data set




                        Spectra
                         p


Category variables: 
2 values: 0 & 1




                         www.camo.com
57
Good models for all groups




                        www.camo.com
58
Prediction




             www.camo.com
Prediction on the AciDiSol model


   A lot of uncertainty on the foreign samples.




                                                  www.camo.com
60
Prediction on the MCC model


 A lot of uncertainty on the foreign samples.


                                                     MCC is well classified




                                                www.camo.com
61
Inlier vs Hotelling T2




  MCC20 is an inlier




                         www.camo.com
62
                   Conclusions


• MVA can be used for classification /
  characterization as well as quantification
                               q
  purposes
• Samples are in a group or not or getting a
  specific predicted value and you get diagnostic
  tools to understand the results
• Diagnostics made at an early stage enable you
  to correct for deviation and decrease the cost of
  waste/reproduce.

                                       www.camo.com
Conclusions




              www.camo.com
64
                         Objectives and Tools



             Objective
               j                               The Unscrambler X
 • Process Understanding                   •   Design of Experiments (DoE)
 • Identification and understanding of     •   Statistical Hypothesis Tests
   raw  materials                          •   Exploratory Data Analysis
                                                 p         y          y
 • Product and Process Development         •   Regression modelling
 • Root Cause Analysis                     •   Classification
 • Prediction of Quality                   •   Prediction




Define          Design           Analyze   Implement                     Improve

                                                          www.camo.com
65
                General Conclusions


• Multivariate analysis:
   – gives y a g
     g      you global
     picture.
   – is an understanding
     tool.
   – is an improving tool.




                                      www.camo.com
66
                         Benefits


• Multivariate analysis in The Unscrambler X benefits:
  – Team work (project architecture, notes, info)
               (p j                ,       ,    )
  – Reporting work (informative plots, report generator)




                                              www.camo.com
67
      Archived webinars
www.camo.com/training/webinars‐seminar.html




                                          www.camo.com
68
                       Global Presence
                                      Head office :
                                      Oslo, Norway
                                      Oslo Norway      Sales Office:
                                                       Sales Office:
                                                          Japan




                                                                 Sales Office:
                                                                 Sales Office:
                                                                  Sydney, AU


Sales Office:
Woodbridge, 
     NJ
                                                 R&D: 
                                             Bangalore, India
                Resellers / Distributors
                                                       www.camo.com
69
        Questions




Marion C n marion@camo no
       Cuny: marion@camo.no




                        www.camo.com

Weitere ähnliche Inhalte

Ähnlich wie Bridging The Gap Between Data Knowledge

Finance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan MeetupFinance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan MeetupEric Detterman
 
Dynamic Reactor Pattern for Distributed Systems in Control and Monitoring
Dynamic Reactor Pattern for Distributed Systems in Control and MonitoringDynamic Reactor Pattern for Distributed Systems in Control and Monitoring
Dynamic Reactor Pattern for Distributed Systems in Control and MonitoringJordan McBain
 
高性能网站建设指南
高性能网站建设指南高性能网站建设指南
高性能网站建设指南Bob Huang
 
The Kubernetes Effect
The Kubernetes EffectThe Kubernetes Effect
The Kubernetes EffectBilgin Ibryam
 
Fighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless phpFighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless phpFabio Pellegrini
 
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September LondonIcon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September LondonIBM Systems UKI
 
How to Operate Kubernetes CI/CD Pipelines at Scale
How to Operate Kubernetes CI/CD Pipelines at ScaleHow to Operate Kubernetes CI/CD Pipelines at Scale
How to Operate Kubernetes CI/CD Pipelines at ScaleDevOps.com
 
Ema kognitio comparative analysis webinar slides
Ema kognitio comparative analysis webinar slidesEma kognitio comparative analysis webinar slides
Ema kognitio comparative analysis webinar slidesKognitio
 
Introduction To Scrum
Introduction To ScrumIntroduction To Scrum
Introduction To ScrumDave Neuman
 
Sure you’re growing, but are you scaling?
Sure you’re growing, but are you scaling?Sure you’re growing, but are you scaling?
Sure you’re growing, but are you scaling?Publicis Sapient
 
Detection of Seed Methods for Quantification of Feature Confinement
Detection of Seed Methods for Quantification of Feature ConfinementDetection of Seed Methods for Quantification of Feature Confinement
Detection of Seed Methods for Quantification of Feature ConfinementAndrzej Olszak
 
Play framework 2 : Peter Hilton
Play framework 2 : Peter HiltonPlay framework 2 : Peter Hilton
Play framework 2 : Peter HiltonJAX London
 
Web App Testing - A Practical Approach
Web App Testing - A Practical ApproachWeb App Testing - A Practical Approach
Web App Testing - A Practical ApproachWalter Mamed
 
Self Healing blue/green Deployments with Dynatrace and Keptn
Self Healing blue/green Deployments with Dynatrace and KeptnSelf Healing blue/green Deployments with Dynatrace and Keptn
Self Healing blue/green Deployments with Dynatrace and KeptnFlorian Bacher
 
2010 Open Source CMS Market Share Report
2010 Open Source CMS Market Share Report2010 Open Source CMS Market Share Report
2010 Open Source CMS Market Share ReportKelvin Lim
 
SCM Migration Webinar - English
SCM Migration Webinar - EnglishSCM Migration Webinar - English
SCM Migration Webinar - EnglishCollabNet
 
Next-Gen Business Transaction Configuration, Instrumentation, and Java Perfor...
Next-Gen Business Transaction Configuration, Instrumentation, and Java Perfor...Next-Gen Business Transaction Configuration, Instrumentation, and Java Perfor...
Next-Gen Business Transaction Configuration, Instrumentation, and Java Perfor...AppDynamics
 
Agentless System Crawler - InterConnect 2016
Agentless System Crawler - InterConnect 2016Agentless System Crawler - InterConnect 2016
Agentless System Crawler - InterConnect 2016Canturk Isci
 

Ähnlich wie Bridging The Gap Between Data Knowledge (20)

Neoload
Neoload Neoload
Neoload
 
Finance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan MeetupFinance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan Meetup
 
Dynamic Reactor Pattern for Distributed Systems in Control and Monitoring
Dynamic Reactor Pattern for Distributed Systems in Control and MonitoringDynamic Reactor Pattern for Distributed Systems in Control and Monitoring
Dynamic Reactor Pattern for Distributed Systems in Control and Monitoring
 
高性能网站建设指南
高性能网站建设指南高性能网站建设指南
高性能网站建设指南
 
The Kubernetes Effect
The Kubernetes EffectThe Kubernetes Effect
The Kubernetes Effect
 
Fighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless phpFighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless php
 
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September LondonIcon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
 
How to Operate Kubernetes CI/CD Pipelines at Scale
How to Operate Kubernetes CI/CD Pipelines at ScaleHow to Operate Kubernetes CI/CD Pipelines at Scale
How to Operate Kubernetes CI/CD Pipelines at Scale
 
Ema kognitio comparative analysis webinar slides
Ema kognitio comparative analysis webinar slidesEma kognitio comparative analysis webinar slides
Ema kognitio comparative analysis webinar slides
 
Introduction To Scrum
Introduction To ScrumIntroduction To Scrum
Introduction To Scrum
 
Sure you’re growing, but are you scaling?
Sure you’re growing, but are you scaling?Sure you’re growing, but are you scaling?
Sure you’re growing, but are you scaling?
 
Detection of Seed Methods for Quantification of Feature Confinement
Detection of Seed Methods for Quantification of Feature ConfinementDetection of Seed Methods for Quantification of Feature Confinement
Detection of Seed Methods for Quantification of Feature Confinement
 
Play framework 2 : Peter Hilton
Play framework 2 : Peter HiltonPlay framework 2 : Peter Hilton
Play framework 2 : Peter Hilton
 
Web App Testing - A Practical Approach
Web App Testing - A Practical ApproachWeb App Testing - A Practical Approach
Web App Testing - A Practical Approach
 
Self Healing blue/green Deployments with Dynatrace and Keptn
Self Healing blue/green Deployments with Dynatrace and KeptnSelf Healing blue/green Deployments with Dynatrace and Keptn
Self Healing blue/green Deployments with Dynatrace and Keptn
 
2010 Open Source CMS Market Share Report
2010 Open Source CMS Market Share Report2010 Open Source CMS Market Share Report
2010 Open Source CMS Market Share Report
 
Pelatihan
PelatihanPelatihan
Pelatihan
 
SCM Migration Webinar - English
SCM Migration Webinar - EnglishSCM Migration Webinar - English
SCM Migration Webinar - English
 
Next-Gen Business Transaction Configuration, Instrumentation, and Java Perfor...
Next-Gen Business Transaction Configuration, Instrumentation, and Java Perfor...Next-Gen Business Transaction Configuration, Instrumentation, and Java Perfor...
Next-Gen Business Transaction Configuration, Instrumentation, and Java Perfor...
 
Agentless System Crawler - InterConnect 2016
Agentless System Crawler - InterConnect 2016Agentless System Crawler - InterConnect 2016
Agentless System Crawler - InterConnect 2016
 

Bridging The Gap Between Data Knowledge

  • 1. Bridging the gap between data and knowledge  Bridging the gap between data and knowledge with The Unscrambler X Discover how data mining can benefit you. Discover how data mining can benefit you. Marion Cuny CAMO Software AS CAMO Software AS www.camo.com
  • 2. 2 Content 1. Improve your work time efficiency 2. Combine data from many sources for enhanced understanding of complex systems 3. Understand the structure of your data and locate the root cause of process/product deviations 4. Design more efficient processes and products 5. Predict quality at an early stage and classify raw material/batch attributes 6. Conclusions 6 C l i www.camo.com
  • 4. 4 Organized and annotated projects and audit trail Project Navigator Know the project progression by  looking at the: looking at the: • Project organization,  • Audit trail and  • Information and notes displayed for Information and notes displayed for  Info and Notes Boxes each object. www.camo.com
  • 5. 5 Preview the results of your pretreatment Save time in optimizing the  Save time in optimizing the parameters of your pretreatments  before performing them. before performing them. www.camo.com
  • 6. 6 Conclusion • Organized data save you a lot of time! What did I/my colleague do last month with this dataset? What was the plot that was showing the results? • Preview of results: don’t do things that don’t give don t don t good results. www.camo.com
  • 7. Combine data from many sources for enhanced  Combine data from many sources for enhanced understanding of complex systems www.camo.com
  • 8. 8 Import data for various sources Unscrambler matrices U bl ti ASCII Text Excel. Also possible to use copy‐paste  and drag and drop Matlab Spectral formats Database (Oracle, SQL,..) D b (O l SQL ) www.camo.com
  • 10. 10 System Integration Partners • Integration for online monitoring and control: – Siemens SiPAT – Optimal SynTQ – Symbion y – ABB XPAT & FTSW integration – GE Fanuc GE Fanuc www.camo.com
  • 11. 11 OPC import menu www.camo.com
  • 12. 12 Imported data www.camo.com
  • 13. 13 Combine them in the analysis • X and Y matrices can be in separated datasets p • Aggregate matrices www.camo.com
  • 14. 14 Conclusion • See relationships and create models between any kind of data: y – Different type – Different stages of the p g process and get a clear understanding of what is going on. www.camo.com
  • 15. Understand the structure of your data and locate the  y root cause of process/product deviations www.camo.com
  • 16. Fundamentals of Multivariate Statistical Process Control • Th Ellipse i k The Elli is known as Hotellings T2 Ellipse and represents a 95% confidence region. • There are regions in the multivariate Variable 2 control chart that are forbidden in the i i t th univariate charts. • There are also regions in the univariate sense that are out of Variable V i bl 1 control in a multivariate sense www.camo.com
  • 17. 17 Design Space: As defined by ICH Q8 The multidimensional combination and interaction of input  p variables and process parameters that have been demonstrated to  provide assurance of quality Design Space Desired State Undesired State www.camo.com
  • 18. 18 NIR Spectroscopy for monitoring the granulation process • Acquire NIR spectra during the process • Goal: Understand batch behavior, and follow process trajectories with PCA High Shear Granulator (Glatt  g S ea a ua o ( a TMG) with diffuse reflectance  probe and NIR spectrometer  collecting spectra at 2 second  collecting spectra at 2 second interval www.camo.com
  • 19. 19 High Shear Wet Granulation • Granulation process is important to: • increase particle size • enhance compressibility • improve hydrophilicity • improve product h i d t homogeneity it • The process has three stages: • Dry mix phase - lactose & starch ( minutes) (2 ) • Liquid addition phase – PVP and water (1-2 minutes) • Granulation (3-5 minutes) www.camo.com
  • 20. 20 Granulation batches studied • Diffuse reflection NIR spectra collected at 2-3 second intervals for 15 batches, giving 130-180 spectra per batch • Each spectrum 1100-2200 nm (1101 variables) • First three batches run at target conditions – Some process changes in terms of addition rates, impeller speeds, granulation time in other batches • PCA model to find patterns and groupings, and model the granulation process www.camo.com
  • 21. 21 First derivative NIR spectra of HSG process Color coded to highlight the stages of the process: Mixing of lactose & starch Liquid Addition – water & PVP Granulation OH peaks increase on addition Change in CH bands due to binders www.camo.com
  • 22. 22 PCA analysis: line plot of PC score 1 Batches 4 & 5 differ: no PVP was added during the liquid  addition phase dditi h Batch 6: target conditions with longer granulation time www.camo.com
  • 23. 23 PCA score plots of 3 batches run under target conditions Granulation – end point Dry mixing phase Liquid addition phase www.camo.com
  • 24. 24 Granulation trajectory from 3-D Scores plot Granulation ‐ end Dry mix Dry mix Liquid addition www.camo.com
  • 25. 25 Conclusion • The structure of a data set is revealed by PCA. • Note: sometime you need pre-treatment to reveal pre treatment the structure accurately. www.camo.com
  • 27. 27 Principle of DoE • Perform the least number of experiments to cover the design space in an efficient way. X2 X2 max max min min min max min max X1 X1 www.camo.com
  • 28. 28 Why do we use DoE compared to the “scientific approach”? scientific approach ? • One variable at a time approach: pp In order to establish a relationship between cause and effect, each cause must be investigated separately, all other conditions being fixed. • The limit of the one variable at a time approach: X2 X2 Actual optimum X1 X1 www.camo.com
  • 29. 29 The logical approach Set the goal of the experimentation (model type) Select the variables to include in the design Select the response variables Select the appropriate design X Y Ex: Maximize the Ex: Cooking time, Ex: Stability BBD, Ex: CCD quality of our cookies: temperature, chocolate preference, cost Quadratic model content www.camo.com
  • 30. 30 Start tab www.camo.com
  • 31. 31 Define variables tab All the variables are defined in the same table. Easy definition thanks to the tick box menu and radio buttons. Easy definition thanks to the tick box menu and radio buttons www.camo.com
  • 32. 32 Choose the design tab Auto‐selection of the best suiting design Designs stated as actions Information on the selected design www.camo.com
  • 33. 33 Design details Select the resolution of the design depending on your goal and the number of  experiment to run. www.camo.com
  • 35. 35 Randomization www.camo.com
  • 36. 36 Summary The calculation of the power for the two  response variables shows that to detect a  difference of 0.6 for the preference this  design is not appropriate as the power is  d h below 0.8. We can look for the LSD that can be found. W l k f th LSD th t b f d www.camo.com
  • 37. 37 Tables in X www.camo.com
  • 38. 38 Analysis www.camo.com
  • 40. 40 Results: Diagnostics Probable curvature effect www.camo.com
  • 41. 41 Results: Residuals Or maybe a bias at  the end of  experimentation. www.camo.com
  • 42. 42 Extension of the design www.camo.com
  • 43. 43 Extension of the design www.camo.com
  • 45. 45 Conclusion • DoE helps you to: – Create – Improve a process or product product. www.camo.com
  • 46. Predict quality at an early stage and  Predict quality at an early stage and classify raw material/batch attributes www.camo.com
  • 47. 47 Visualizing groups • PCA score plot • Clustering Make a model to predict the group:  Make a model to predict the group SIMCA, PLSDA, SVM and LDA www.camo.com
  • 48. 48 SIMCA Classification • Soft Independent modeling of Class Analogies: p g g – Make a PCA model for each class; – Project new samples onto the model. j p Maximum  Center  Center distance to the  distance to the of  model (Si) PC2 model Samples from  Maximum  g p group A PC1 group A g p leverage for the  leverage for the Samples from  model (Hi) group B PC1 group B PC1 Samples from  group C PC1 group C www.camo.com
  • 49. 49 SIMCA Classification • Soft Independent modeling of Class Analogies: p g g – Make a PCA model for each class; – Project new samples onto the model. j p PC2 Samples from  group A group A PC1 group A PC1 group A Samples from  group B PC1 group B PC1 Samples from  group C PC1 group C www.camo.com
  • 50. 50 Example dataset NIR data of: • 83 samples: 67 calibration and 16 test • 2600 variables • 5 groups but only 4 for creating the models www.camo.com
  • 51. 51 Overview PCA scores plot of training samples from 4 classes www.camo.com
  • 52. 52 Classification • PCA model on independent classes www.camo.com
  • 53. 53 Classification of the new samples All the foreign samples are  All th f i l rejected by all models. MCC samples not  recognized by its model. recognized by its model www.camo.com
  • 54. 54 The MCC sample is detected as outlier as its leverage is too important www.camo.com
  • 55. 55 PLS Discriminant Analysis • Each class is represented by a 0 / 1 variable: – Build a regression model with those variables as responses ( p (PLS1 for 1 or 2 classes, else PLS2); , ); – Make predictions for new samples: close to 1 means “member”, close to 0 “non member”. A B C Samples from  1 0 0  Predicted Predicted Predicted group A 1 0 0 1 1 1 Samples from  0 1 0 group B 0 1 0 0 1 0 0 0 0 Samples from  0 0 1 group C 0 0 1 0                1 Measured 0                1 Measured 0                1 Measured 0 0 1 Model B Model A  Model C Classification www.camo.com
  • 56. 56 Example data set Spectra p Category variables:  2 values: 0 & 1 www.camo.com
  • 57. 57 Good models for all groups www.camo.com
  • 58. 58 Prediction www.camo.com
  • 59. Prediction on the AciDiSol model A lot of uncertainty on the foreign samples. www.camo.com
  • 60. 60 Prediction on the MCC model A lot of uncertainty on the foreign samples. MCC is well classified www.camo.com
  • 61. 61 Inlier vs Hotelling T2 MCC20 is an inlier www.camo.com
  • 62. 62 Conclusions • MVA can be used for classification / characterization as well as quantification q purposes • Samples are in a group or not or getting a specific predicted value and you get diagnostic tools to understand the results • Diagnostics made at an early stage enable you to correct for deviation and decrease the cost of waste/reproduce. www.camo.com
  • 63. Conclusions www.camo.com
  • 64. 64 Objectives and Tools Objective j The Unscrambler X • Process Understanding • Design of Experiments (DoE) • Identification and understanding of  • Statistical Hypothesis Tests raw  materials • Exploratory Data Analysis p y y • Product and Process Development • Regression modelling • Root Cause Analysis • Classification • Prediction of Quality • Prediction Define  Design Analyze Implement Improve www.camo.com
  • 65. 65 General Conclusions • Multivariate analysis: – gives y a g g you global picture. – is an understanding tool. – is an improving tool. www.camo.com
  • 66. 66 Benefits • Multivariate analysis in The Unscrambler X benefits: – Team work (project architecture, notes, info) (p j , , ) – Reporting work (informative plots, report generator) www.camo.com
  • 67. 67 Archived webinars www.camo.com/training/webinars‐seminar.html www.camo.com
  • 68. 68 Global Presence Head office : Oslo, Norway Oslo Norway Sales Office: Sales Office: Japan Sales Office: Sales Office: Sydney, AU Sales Office: Woodbridge,  NJ R&D:  Bangalore, India Resellers / Distributors www.camo.com
  • 69. 69 Questions Marion C n marion@camo no Cuny: marion@camo.no www.camo.com