1. 4th International Summer School
Achievements and Applications of Contemporary
Informatics, Mathematics and Physics
National University of Technology of the Ukraine
Kiev, Ukraine, August 5-16, 2009
Quality Control and Improvement
in Manufacturing
Gülser Köksal , Sinan Kayalıgil
Department of Industrial Engineering, METU, Ankara, Turkey
Gerhard-Wilhelm Weber, Başak Akteke-Öztürk
IAM, METU, Ankara, Turkey
2. Project Team
Gülser Köksal (IE)
Nur Evin Özdemirel (IE)
Sinan Kayalıgil (IE)
Bülent Karasözen (MATH, IAM)
Gerhard Wilhelm Weber (IAM)
Đnci Batmaz (STAT)
Murat Caner Testik (IE)
Đlker Arif Đpekçi (IE)
Berna Bakır (IS)
Fatma Güntürkün (STAT)
Başak Öztürk (IAM)
Fatma Yerlikaya (IAM)
Other Collaborators:
Esra Karasakal (IE)
Zeev Volkovich (CS - Israel)
Adil Bagirov (AOpt - Australia)
Özge Uncu (IE- Canada)
Pakize Taylan (IAM)
Süreyya Özöğür (IAM)
Elçin Kartal (STAT)
Selcan Cansız (STAT&IE)
3. OUTLINE
Project Objectives
Quality Improvement (QI)
Data Mining (DM)
DM Applications in QI in Literature
DM Applications in the Project
Casting QI Problem (Decision Trees, Neural Nets,
Clustering)
Driver Seat Design Problem (Decision Trees)
PCB QI Problem (Association)
Other approaches
Nonlinear/Robust Regression
Conclusion
4. Project Objectives
Determine which DM approaches can
effectively be used in QI
Test performance of DM approaches on
selected quality design and improvement
problems with especially voluminous data
and multiple input and quality characteristics
Develop more effective approaches to solve
such problems
5. Project Scope
Manufacturing industries keeping records
of various input and quality characteristics
QI problems for which traditional analysis
and solution approaches are ineffective
due to too many variables and complicated
relationships
“Parameter design optimization” and
“quality analysis” type of quality problems
6. The Approach
Collect appropriate data from different industries for
different quality problems
Apply appropriate DM techniques in solving those
problems
Compare performances of DM techniques
Determine which DM techniques can effectively be
used for which type of QI problems
Develop new / improved algorithms
8. Quality Control and Improvement Activities
Product development stage Quality control and improvement activity
Product design Concept design
Parameter design (design optimization)
Tolerance design
Manufacturing process design Concept design
Parameter design (design optimization)
Tolerance design
Manufacturing Quality monitoring
Process control
Inspection / Screening
Quality analysis
Customer usage Warranty and repair / replacement
9. Parameter Design Optimization
Static problem:
INPUT Find settings of manipulated input
for fixed output target
Disturbance and minimum variability
Unmeasured
Measured Dynamic problem:
Find settings of manipulated input
for changing output targets
and minimum variability
INPUT Unmeasured
PRODUCT/PROCESS OUTPUT
Manipulated
Measured
10. Dynamic Manufacturing Environment
INPUT Goal: to have process output
within target specifications with
Disturbance smallest amount of variation
around the target
(assignable causes, noise)
Unmeasured
statistical process control
to detect assignable causes
Measured
(quality monitoring)
INPUT Unmeasured
PROCESS OUTPUT
Manipulated
Measured
engineering process control
11. Static Manufacturing Environment
INPUT Goal: to have process output
within target specifications with
Disturbance smallest amount of variation
around the target
Quality analysis: (assignable causes, noise)
Unmeasured
measured / manipulated input
Measured
→ output
INPUT Unmeasured
PROCESS OUTPUT
Manipulated
Measured
12. Quality Control and Improvement Activities:
Quality Analysis
Quality Analysis consists of
- Finding characteristics critical-to-quality (CTQ)
- Finding input variables that significantly affect quality output
- Predicting quality
- quality output is a real valued variable
- finding empirical models that relate input characteristics of quality to output
ones
- using such models to predict what the resulting quality characteristics will
be for a given set of input parameters
- Classification of quality
- For nominal, binary or ordinal outputs
- For a given set of input parameters, predicting the class of the quality
output
14. Data Mining
Data mining (knowledge discovery in
databases) :
Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) information or patterns in large databases
What is not data mining?
(Deductive) query processing
Expert systems or small ML/statistical programs
15. Data mining – A KDD Process
Data mining is the core of
KDD process Pattern Evaluation
Data Mining
Task-relevant Data
Data Selection
Data Preprocessing
Data Warehouse
Data Cleaning
Data Integration
Databases
16. Data Mining Techniques
Supervised Learning
Classification and regression
Decision trees
Neural networks
Support vector machines
Bayesian belief networks
Non-linear robust regression
Rule induction
Association rules
Rough set theory
17. Data Mining Techniques
Unsupervised Learning
Clustering
K-means, Fuzzy C-means, Hierarchical, Mixture of
Gaussians
Neural Networks (Self Organizing Maps)
Outlier and deviation detection
Trend analysis and change detection
18. Some Applications
Market research and customer
relationship management
Risk analysis and management
Fraud detection
Text and web analysis
Intelligent inquiry
Process modelling
Supply chain management
19. Supply Chain Management Applications
Reducing risk of accepting bad credit cards in
payments through e-commerce
Controlling inventory by analyzing past
business, monitoring present transactions, and
predicting future sales
Controlling inventory by predicting customer’s
behavior patterns (e-commerce)
CRM (clustering customers, understanding their
needs and behaviors, etc.)
Source: Kusiak, A. “Data Mining in Design of Products and Production Systems”, Proceedings in INCOM 2006,
Vol.1, 49-53.
20. SOME DM APPLICATIONS on QI PROBLEMS
Predicting quality for given process parameter levels
Finding optimal process parameter levels for quality
Determining effects of equipment on quality
Determining factors / parameters effects on quality
Tolerancing
Identifing relationships among several quality
characteristics
Determining assignable causes that make a process
out of control (unstable) on time
21. Some Applications in Literature
Integrated circuit manufacturing
Fountain et al. (2000), Kusiak (2000)
Packaging manufacturing
Abajo et al. (2004)
Semiconductor wafer manufacturing
Gardner (2000), Kusiak (2000), Bae (2005),
Chen (2004), Braha (2002), Hu (2004),
Dabbas (2001), Fan (2001), Mieno (1999)
Skinner (2002)
Sheet metal assembly
Lian et al. (2002)
22. Some Applications in Literature
Steel production
Cser et al. (2001)
Chemical manufacturing
Shi et al. (2004), Gillblad (2001)
Sun (2003)
Ultra-precision manufacturing
Huang&Wu (2005)
Conveyor belts manufacturing
Hou et al. (2003), Hou (2004)
Plastic manufacturing
Ribeiro (2005)
24. Literature Survey (cont.d)
RBF-NN
BA
Finding CTQs 1
1 CC
1 BN
1
GA
RSM AHC 1
1
1 KW ANN
ANN- BN 11 SVM
DT 1
2
7 1
GA
1
ANN-SOM FST
3 3
RST
ANN 3
RST
6
DT 5
5
ANOVA
5
R
5
Classification of quality
25. Literature Survey (cont.d)
TM
1
ANN-RBF
1
Predicting quality
ANN-RBF
3
ANN-BN
4
ANN
6
FST GA
4 11
DT
ANN 4
38
R
13 Parameter optimization
26. QI Problems – Examples from the Project
Casting manufacturing
Driver seat design
Circuit board manufacturing
27. CASTING QUALITY IMPROVEMENT PROBLEM
– The Company
RKN is a casting company having two
factories located in Ankara
It manufactures intermediate goods for the
automotive, agricultural tractor and motor
industries
RKN applies 6σ methodologies in
improving its processes
29. CASTING QUALITY IMPROVEMENT PROBLEM
– Some Research Questions
Is there any relation between defect types
and process parameters?
Do the important factors for different
defect types interact?
Which process parameter levels are better
in reducing the defects?
30. DRIVER SEAT DESIGN OPTIMIZATION PROBLEM
– The Company
TFD is one of the largest automobile
manufacturers in Turkey located in Bursa.
They would like to improve the design of
the driver seat of a commercial vehicle for
more customer satisfaction.
The driver seat is a critical part of an
automobile that affects the buying
decision.
32. DRIVER SEAT DESIGN OPTIMIZATION PROBLEM
– Some Research Questions
Which customer features do affect overall
satisfaction from the seat?
What are the characteristics of highly
satisfied /dissatisfied customers from the
seat?
Which features of the seat do affect overall
satisfaction from the seat?
33. CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM
– The Company
VPC is one of the largest electronic
equipment manufacturers in Turkey.
They produce approximately 35-40
thousand PCBs per day, and 1.5-2 million
PCBs per month.
70-80 thousand PCBs are scrapped every
month.
They would like to minimize PCB failures.
34. CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM
– The Products
Final products:
DVD player/recorder, DivX player, AV receiver, digital
satellite receiver, digital TV receiver, digital media
adapter
Component of interest:
Various PCBs (Printed Circuit
Boards)= Board+Integrated
Circuits+Resistors+Capacitors+
Diots
35. CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM
– Some Research Questions
Which defect types do occur together?
What are the root causes of the defects?
Do suppliers affect the defects?
Do defects occur at certain locations on
the board?
36. Data Mining Software Used in the Project
SPSS Clementine
Matlab
Statistica QC Miner
MARS
39. RKN’s Quality Objectives
Decrease percentage of defective items by
choice of process parameters
Priorities:
products suffering from high percentage of
defects
products of larger share in the total tonnage
although with lower percent defectives
Decrease percentage of products returns
because of the defects determined by
customers
40. Objectives
Decrease the proportion of defective items (to a certain
target value)
Identify the most important process parameters affecting
quality
Finding the ranges of these parameters to operate
(future direction)
Optimizing the proportion of defective items (future
consideration)
41. Perkins021 Cylinder Head
Perkins 021 cylinder head is
one of the two products
chosen for the analysis from
the second casting plant
Reason:
Having problems with Perkins Cylinder Head
Availability of the data
Volume of the data
42. Data Collection
Data in RKN come from several processes
and different time periods.
Weekly
Daily
Hourly
Most of the data come from
Core shop
Molding
Melting
43. Data Collection (Cont...)
Lot: total production in a day (one or more shifts)
Daily records consist of the total volume of
production, total count of defective products and
the distribution of defect types
Response variables recorded are:
total number of defective products
number of defective products for 19 defect types
number of defective products returned by the customer
(newly added)
44. Data of Core Shop
Cores are produced according to a
weekly production plan
Cores used for a product are ready one
or two days before use
Specific core usage in a shift cannot be
identified accurately
Production may stop for a while and even
the cores from 3 or more days in the past
can be put to use arbitrarily
45. The Data
5 month’s production data
Number of records : 95 (averages of 95 days)
Input : real (47)
Output : discrete (8)
Can be transformed to binary, nominal or ordinal variables if
needed
Some missing data
AFTER PREPROCESSING
6 real uncorrelated response variables (proportions of
defect types) + 1 total response (proportion of defective
items)
36 real feature (predictor) variables
92 observations
47. Univariate Decision Tree Methodology –
CART (Continuous data)
DECISION TREE MODEL (LEAST) SQUARE DEVIATION
1
R (t ) = ∑ (y i − y ( t )) 2
N (t ) i∈ t
IMPURITY MEASURE
Φ(s,t) = R(t) − pLR(tL ) − pRR(tR )
A TYPICAL RULE GENERATED
IF X 22 > 13 .275 AND X 9 > 3 . 095
THEN % Y 6 = 0 .006 ( Support = 48 / 92 )
48. Research Questions
Can we reduce problem dimension by extracting
important features only?
Is there any relation between defect types and process
parameters?
Do the important factors for different defect types
interact?
Are there significant changes in process parameter when
a defect rate is high or low?
Which process parameter levels are better in reducing
the defects?
Is there any period when high defect rates occur
specifically?
Is there any pattern in the sequence of defect type
occurences?
50. Univariate Decision Tree Methodology – Nominal data
Number of records: 748
Analysis Accuracy: 93.45%
inputs: x32, x12, x22, x13, x2, x19, x10, x9, x36, x8, x28
Tree depth: 9
Results for output field y
Comparing $C-y with y
'Partition' 1_Training 2_Testing
Correct 699 93.45% 294 92.74%
Wrong 49 6.55% 23 7.26%
Total 748 317
Coincidence Matrix for $C-y (rows show actuals)
'Partition' = 1_Training 0.000000 1.000000 2.000000
0.000000 49 0 3 %94.2
1.000000 0 224 19 %92.1
2.000000 0 27 426 %94
'Partition' = 2_Testing 0.000000 1.000000 2.000000
0.000000 18 0 2
1.000000 0 115 4
2.000000 0 17 161
51.
52. Conclusion of the Casting Work
DT induced rules were instrumental in
planning new controlled experiments
Process optimization may be sought based
upon these field experiments
DT induced rules may also be used to set
tolerance levels for the uncontrollable
features (variables)
53. Suggested Factor Levels
Pertinent
Fact contoll Adjusted Suggested Defect
or able? Setting Observed Range Trial Range Types Suggested Mean Setting
x2 H [15, 30] [20, 28] [23, 28] (y2),(y3),(y6),(y8) mümkünse [23, 28]
x3 H [15, 30] [30, 40] [31, 37.5] y1,y3 mümkünse [31, 37.5]
x4 E [13, 15] [12.171, 13.678] [12.295, 13.678] y1 sabit [12.295, 13.678]
x5 E [14, 16] [12.27, 13.66] [12.27, 13.165] y8 sabit [12.27, 13.165]
x6 E [7.5, 9.5] [7.585, 8.25] [7.917, 8.25] y8 sabit [7.917, 8.25]
x8 E [35, 42] [21.75, 42] [21.75, 35] y3, (y2) sabit [21.75, 35]
x9 E [3, 3.5] [2.98, 3.387] yok y2, y3, y6, y8 3 seviye [3.183, 3.216], [3.216, 3.26], [3.26, 3.387]
x11 E [18, 23] [19.8, 22.9] [20.339, 22.9] y3 sabit [20.339, 22.9]
x12 E [250, 400] [290, 360] [350, 360] y2 sabit [350, 360], olmazsa [305, 360]
x14 E [3.5, 5.5] [4.7, 5.2] [4.724, 5.2] y2 sabit [4.724, 5.2]
x16 H [11, 23] [13.2, 30] [15.86, 30] y1, (y2) mümkünse [15.86, 30]
x17 H [11, 23] [15.9, 31.5] [26.55, 31.5] y1 mümkünse [26.55, 31.5]
x19 H [11, 23] [14.1, 24.9] yok y2 kendi seyrine bırakılacak
x20 E 40 [38.992, 42.85] [38.992, 41.32] y3 sabit [38.992, 41.32]
x21 E 50 [48.68, 52.71] [49.181, 52.71] y9 sabit [49.181, 52.71]
28 marta kadar = 12 28 marta kadar: [10.85, 14,35] 4 seviye [10.85, 13.125], [12.275, 14.35], [14.35,
x22 E 31 marttan sonra = 22 31marttan sonra: [20.05, 33.428] yok y1,y2,y3,y6 17.2], [17.2, 33.42]
x25 H aralık yok [2.5, 6.9] [2.5, 6.533] y8 mümkünse [2.5, 6.533]
x26 E [1420, 1430] [1367.59, 1428.23] [1367.59, 1425.98] y8, y9 sabit [1367.59, 1425.98]
x27 H aralık yok [2.259, 4.95] [2.259, 4.2] y2, (y3) mümkünse [2.259, 4.2]
x28 H aralık yok [11.7, 16.9] yok y3, y6 kendi seyrine bırakılacak
y1,y3,y6, 3 levels [3.208, 3.304],
x29 YES [3.2, 3.35] [3.208, 3.41] NOT AVAIL y8 [3.304, 3.325], [3.355, 3.41]
x30 E [1.85, 2] [1.823, 2] yok y1,y2,y3 2 seviye [1.823, 1.88], [1.88, 2]
x32 E [0.2, 0.3] [0.171, 0.283] yok y1,y2 2 seviye [0.171, 0.184], [0.184, 0.283]
June 2007
x33 E maximum 0.3 [0.0767, 0.552] METU-IE and[0.174, 0.552] Workshop
TU/e-OPAC y2 sabit [0.174, 0.552]
x35 E [0.08, .12] [0.0762, 0.1122] [0.088, 0.1122] y1 sabit [0.088, 0.1122]
54. DRIVER SEAT DESIGN OPTIMIZATION PROBLEM
Questionnairre data
80 observations/subjects
28-88 input variables (age, sex, distance
travelled, anthropometric measures, ease of use,
attractives, etc.)
1-53 output variables (back comfort, tigh comfort,
overall satisfaction, ease of use, attractiveness,
etc.)
55. Rules for customer satisfaction
Rule for 7 / 7 (very satisfied) (support=4; confidence=1.0)
If
Lumbar ache after driving for a long time = 0 and
Video gray as a seat cover design = 1 and
Accept to pay more for the seat belt sensor = 0 and
Adequate support by the seat cushion = 1 then
7,0 (very satisfied)
Rule for 6 / 7 (satisfied) (support=10; confidence=1.0)
If
Lumbar ache after driving for a long time = 0 and
Video gray as a seat cover design = 1 and
Accept to pay more for the seat belt sensor = 0 then
6,0 (satisfied)
Rule for 4 / 7 (normal) (support=8; confidence=0.75)
If
Lumbar ache after driving for a long time = 0 and
Easy reach to the lumbar support adjustment =0 then
4.0 (normal)
57. Neural Network Modeling - General
A neural network (NN) is an interconnected group of artificial neurons that uses
a mathematical or computational model for information processing based on a
connectionist approach to computation.
Incorporates learning rather than programming and parallel rather than
sequential processing.
Neural networks resemble the human brain in two respects:
The network acquires knowledge from its environment using a learning process
(algorithm)
Synaptic weights, which are inter-neuron connection strengths, are used to store the
learned information.
59. Inside the Node
A node
Components:
Receives n-inputs
Weights
Compute net input according to base
Base function (summing unit)
function
Activation function
Applies activation function to the net
input
Bias
Outputs result
b
x1 w1 Activation
function
net Output
x2
.
w2 ∑ f(net)
y
Input
Base
values .
function
. nodei
Xm wm
weights
60. Properties
Capabilities
Fault tolerance
Robustness
Non-linear mapping
Learning and generalization
Optimization
Issues
Number of source nodes
Number of hidden layers
Number of hidden nodes per hidden layer
Training data (Too much…..overfitting, too little……inaccurate
classification)
Number of classes(sink)
Interconnections
Activation function
Learning technique
Stopping criteria
61. Application 1:
Classification of quality in Casting
Data:
36 input variable (continuous)
1 output variable (categorical with 3 levels – 1: first defect type exists, 2:
second defect type exists, 0: none of these two defect types exist)
Partition: Training -> 70%, Testing -> 30%
Learning rule: Back-propagation
Network Topology
Input layer (36 neurons)
Hidden layer (6 neurons)
Output layer (1 neuron)
To prevent overfitting, training set was divided again into training and testing set
(partitioning the partition), trained on training set, and error is evaluated on the
test set at each cycle
63. Application 2: Prediction of quality in
Casting
Data:
36 input variable (continuous)
1 output variable (percentage of defectives for a certain defect type)
Partition: Training -> 70%, Testing -> 30%
Learning rule: Back-propagation
Method: Exhaustive prune (finds the best topology)
Final Network Topology
Input layer (36 neurons)
First hidden layer (25 neurons)
Second hidden layer (17 neurons)
Output layer (1 neuron)
64. Results
Estimated accuracy: 99.95%
Training results are slightly better than
testing results (overfitting)
Statistics
65. Conclusion
Neural networks can be used for both
classification and prediction
Unlike decision trees, neural networks are
black-box models
To decide on best production regions,
further study may be needed (simulation,
DOE, etc).
67. CLUSTERING - General
Clustering of data is a method by which large sets of data
is grouped into clusters of smaller sets of similar data.
The example below demonstrates the clustering of balls
we see clustering is grouping data or dividing a
large data set into smaller data sets of some similarity.
68. Clustering Algorithms
A clustering algorithm attempts to find natural groups of
components (or data) based on some similarity
Clustering algorithms find k clusters so that the objects of
one cluster are similar to each other whereas objects of
different clusters are dissimilar.
70. Hierarchical vs. Partitional
A hierarchical algorithm partitions the data set in a nested
manner into clusters which are either disjoint or included
one into another. These algorithms are either
agglomerative or divisive according to the algorithmic
structure and the operation they carried on.
A partitional method assumes that the number of clusters
to be found is already given and then it looks for the
optimal partition based on the objective function.
71. Nonsmooth Optimization
Most cases of clustering problems are reduced to solving
nonsmooth optimization problems.
Nonsmooth Optimization Problem:
minimize
subject to
: is nonsmooth at many points of interest
does not have a conventional derivative at these points.
A less restrictive class of assumptions for than
smoothness: convexity and Lipschitzness.
72. Cluster Analysis via Nonsmooth Opt.
Given instances
Problem:
This is a clustering problem with the partitioning method. We will
reformulate this as a nonsmooth optimization problem.
73. Cluster Analysis via Nonsmooth Opt. Cont’d
k is the number of clusters (given),
m is the number of instances (given),
is the j-th cluster’s center (to be found),
association weight of instance , cluster j (to be
found):
( ) is an matrix,
objective function has many local minima.
74. Cluster Analysis via Nonsmooth Opt. Cont’d
if k is not given a priori
Start from a small enough number of clusters k and
gradually increase the number of clusters for the
analysis until a certain stopping criteria is met.
This means: If the solution of the corresponding
optimization problem is not satisfactory, the decision
maker needs to consider a problem with k + 1 clusters,
etc..
This implies: One needs to solve repeatedly arising
optimization problems with different values of k - a task
even more challenging.
75. Cluster Analysis via Nonsmooth Opt. Cont’d
Reformulated Problem:
• A complicated objective function: nonsmooth and nonconvex.
The number of variables in the reformulated nonsmooth
optimization problem above is k×n, before it was (m+n)×k.
• This problem can be solved by related nonsmooth methods
(e.g., Semidefinite Programming, discrete gradient method).
76. Clustering Analysis on RKN Casting Data
We used k-means, PAM (Partitioning Around Medoids) and k-
means improved by Nonsmooth Optimization to identify
homogenous groups in the data.
k-Means: The grouping is done by minimizing the sum of squares
of distances between data and the corresponding cluster centroid.
PAM: A medoid is an object of the cluster, whose average
distance to all the objects in the cluster is minimal.
k-Means improved by Nonsmooth Optimization: k-means
algorithm that solves a nonsmooth optimization subproblem for
calculating the starting point for the k-th cluster center.
79. Results
In the tables above, we showed the relations between
different clustering results. Optimal partitioning with PAM is
obtained for k=4, however for others k=2 gives the best
results. For k=3 and k=4 with k-means, the clusters of 2
and 6 objects are artificial.
These results match with our preprocessing studies
(Cathrene Sugar’s “jump method” and PCA) which
suggested that k is 2 or 4 in our data.
82. Association Analysis
Association rule mining searches for interesting
relationships among the features in a given data
set.
A typical example of association rule mining is
“market basket analysis”.
This process analyzes customer buying habits by
finding associations between the different items
that customers place in their “shopping baskets”
83. Support and Confidence
• Association rules are statements in the form of
IF antecedent(s) THEN consequent(s)
where antecedent(s) and consequent(s) are disjoint
conjunctions of feature-value pairs.
• Two common measures, support and confidence, are used
to evaluate extracted rules
• For a rule defined as X=>Y
• The support of the rule is the joint probability of X and Y,
Pr(X and Y).
• The confidence of the rule is the conditional probability of Y given
X, Pr(Y|X)
86. PCB Manufacturing Data in
Transactional Format
In this format, a single board can be seen in more than one
rows, each of which represent different operation performed
on this product
Serial number can be used as the transaction ID which
distinguishes different products
Attributes (variables) of the boards:
Product type
Description of the failure (failure observed during the final
electrical test)
Root cause (cause of the failure identified during the repair)
Location of the root cause
Board type
Supplier
Operation line failure is detected
Date and time
87. Attributes
11 types of PCB
38 possible failures (e.g., display error, software
error, no audio, etc.)
13 possible root causes (e.g., chip without solder,
resistance is upright, short circuit, etc.)
Location of the root cause on the board
9 board types
6 different suppliers
88. Application: PCB Manufacturing
Sample records from PCB manufacturing data
Board Type serial supplier Failure reason-of-failure Location
1 2459 GOODBOARD display error no solder U45 6.PIN
1 736 TATCHUN-GIA TZOONG AUX1 error short circuit U8 2.PIN
4 990 GIA TZOONG device-not-work sw L71
3 700 TATCHUN-GIA TZOONG display error short circuit R407
6 712 ÜNAL ELEKTRONĐK rgb-cvbs error flash error R412
2 1411 GOODBOARD sw error upright K23
2 663 GOODBOARD-TATCHUN AUX1 error no solder C130
7 627 UNIWELL ELECTRONIC audio error upside-down B353
4 1169 GOODBOARD sw error sw U6
89. Possible Applications of Association Analysis
Identifying failure types taken place on the
same board together.
Association of failures with root cause.
Association of failures with suppliers.
Identifying failures occuring in sequence.
Association of failures with the location of
the root cause on the board
90. Identifying failure types occured on the
same board together
“device-not-functioning” => “flash-
not-loading” (%25, %73)
“flash-not-loading” => “display error”
(%36, %86)
“AUX1 error” AND “feed error” => “ audio
error” (%32, %61)
91. Association of failures with root causes
“upright” AND “Location” = Chip =>
“audio error” (%46, %82)
“no solder” => “device-not-functioning”
(%18, %100)
92. Association of failures with suppliers
“GOODBOARD” => “display error” (%23,
%57)
“UNIWELL” AND “GOODBOARD” =>
“feed error” (%18, %53)
93. Identifying failures dependent on the
sequence of operations
Line 1 = “AUX1 error” => Line 5 = “feed
error” (% 22, % 48)
94. Association of failures with the location of the
root cause on the board
“device-not-functioning” => Location =
“resistance” (%56, %76)
“flash-not-loading” => Location = “U8
2.PIN” (%43, %66)
96. Regression Approaches
MULTIPLE LINEAR REGRESSION (MLR)
NONLINEAR REGRESSION (NLR)
GENERAL LINEAR MODELS (GLM)
GENERALIZED LINEAR MODELS (GLZ)
ADDITIVE MODELS
GENERALIZED ADDITIVE MODELS (GAM)
ROBUST REGRESSION
97. CONCLUSION
Tough QI problems with several input and output
variables can be handled effectively with DM
approaches.
Observational or experimental data, preferentially
voluminous data are needed.
Online data collection systems might need to be
installed
Data quality and pre-processing are crucial
Many tools seem to be difficult to apply in practice for
industry people (advanced training might be necessary)
Results in the form of rules are found useful and
interesting by the industry
98. FUTURE WORK
Continue collecting different data sets for different
QI problems, and applications on them
Also apply other DM approaches such as linear /
robust regression, fuzzy clustering / regression and
rough set theory.
Compare performances.
Develop new / improved DM algorithms for solving
the QI problems.
Multi-response decision tree modeling
Non-smooth optimization for categorical quality
responses
Improved MARS with Tikhonov regularization
99. PAPERS AND PRESENTATIONS
FROM THE PROJECT
Bakır, B., Batmaz, Đ., Güntürkün, F.A., Đpekçi, Đ.A., Köksal, G., and Özdemirel,
N.E., Defect Cause Modeling with Decision Tree and Regression Analysis,
Proceedings of XVII. International Conference on Computer and
Information Science and Engineering, Cairo, Egypt, December 08-10,
2006, Volume 17, pp. 266-269, ISBN 975-00803-7-8.
Đpekçi, A.Đ., Bakır, B., Batmaz, Đ., Testik, M.C., and Özdemirel, N.E., Defect
Cause Modeling with Data Mining: Decision Trees and Neural Networks, to
appear in Proceedings of 56th Session of the 1st International Statistical
Institute, Lisbon, Potugal, August 22-29, 2007.
Akteke-Öztürk, B. and Weber, G. W., "A Survey and Results on Semidefinite
and Nonsmooth Optimization for Minimum Sum of Squared Distances
Problem", Technical Report, 2007.
Öztürk-Akteke, B., Weber, G.W., Kayalıgil, S., Kalite Đyileştirmede Veri
Kümeleme: Döküm Endüstrisinde Bir Uygulama, Yöneylem Araştırması ve
Endüstri Mühendisliği 27. Ulusal Kongresi (YA/EM 2007), Đzmir, Türkiye,
Temmuz 02-04, 2007.
100. PAPERS AND PRESENTATIONS
FROM THE PROJECT (cont.d)
Session TC-38: Tutorial Session: Data Mining
Applications in Quality Improvement
22nd European Conference on Operational
Research, Prague, July 7-11, 2007
Köksal, G., Testik, M.C., Güntürkün, F.A., Batmaz, Đ.,
Data Mining Applications in Quality Improvement: A
Tutorial and a Literature Review
Đpekçi, A.Đ., Köksal, G., Karasakal, E., Özdemirel, N.E.,
Testik, M.C., Multi Response Decision Tree Approach
Applied To A Discrete Manufacturing Quality
Improvement Problem
101. PAPERS AND PRESENTATIONS
FROM THE PROJECT (cont.d)
Köksal, G., Testik, M.C., Güntürkün, F.A., Batmaz, Đ.,
Kalite Đyileştirmede Veri Madenciliği Yaklaşımları ve Bir
Uygulama, 16th National Quality Congress, November
12, 2007, Đstanbul.