1. 19th International Conference on Production Research
AN INTELLIGENT REASONING MODEL FOR YARN MANUFACTURE
Jian-Guo Yang, Fu Zhou, Jing-Zhu Pang, Zhi-Jun Lv
College Of Mechanical Engineering, University of DongHua, Ren Min Bei Road 2999, Song Jiang Zone,
Shanghai, P R China
Abstract
Although many works have been done to construct prediction models on yarn processing quality, the
relation between spinning variables and yarn properties has not been established conclusively so far.
Support vector machines (SVMs), based on statistical learning theory, are gaining applications in the areas
of machine learning and pattern recognition because of the high accuracy and good generalization
capability. This study briefly introduces the SVM regression algorithms, and presents the SVM based
system architecture for predicting yarn properties. Model selection which amounts to search in hyper-
parameter space is performed for study of suitable parameters with grid-research method. Experimental
results have been compared with those of ANN models. The investigation indicates that in the small data
sets and real-life production, SVM models are capable of remaining the stability of predictive accuracy,
and more suitable for noisy and dynamic spinning process
Keywords:
Support vector machines, Structure risk minimization, Predictive model, Kernel function, Yarn quality
1 INTRODUCTION dimensional feature space. The unknown parameters w
Changing economic and political conditions and the and b in Equation (1) are estimated using the training set,
increasing globalisation of the market mean that the textile G. To avoid over fitting and thereby improving the
sector faces ever challenges. To stay competitive, there is generalization capability, following regularized functional
an increasing need for companies to invest in new involving summation of the empirical risk and a complexity
products. Along the textile chain, innovative technologies 2
term w , is minimized [3]
and solutions are required to continuously optimize the
production process. High quality standards and an M
1
∑
2 2
extensive technical and trade know-how are thus R reg = Remp + λ w = f ( xi ) − y i ε
+λ w
prerequisite to keep abreast of the growing dynamics of M i =1
the sector [1]. Although many works have been done to
(2)
construct prediction models on yarn processing quality,
the relation between spinning variables and yarn where λis a regularization constant and the cost function
properties has not been established conclusively so far.. defined by
The increasing quality demands from the spinners make
clear the need to explore innovative ways of quality f ( x) − y − ε ( f ( x) − y ≥ ε )
f ( x) − y ε = ,
prediction furthermore. The widespread use of artificial
0 ( f ( x) − y < ε )
intelligence (AI) has created a revolution in the domain of
quality prediction, for example, application of artificial (3)
neural network (ANN) in textile engineering [2]. This study is called Vapnik’s “ε-insensitive loss function”. It can be
presents a support vector machines based intelligent shown that the minimizing function has the following form:
predictive model for yarn process quality. The relative M
algorithm, model selection and experiments are presented
in detail. f ( x, α , α * ) = ∑ (α i − α i* )k ( x i , x) + b (4)
i =1
2 SVM REGRESSION ALGORITHMS
2.1 Paper title and authors with α iα i* = 0 , α i , α i* ≥ 0 and the kernel function
The main objective of regression is to approximate a k ( x i , x ) describes the dot product in the D-dimensional
function g(x) from a given noisy set of samples
feature space.
G = {( x i , y i )}iN 1
= obtained from the function g. The
basic idea of support vector machines (SVM) for
k ( xi , x j ) = φ ( x i ), φ ( x j ) (5)
regression is to map the data x into a high dimensional It is important to note that the featuresΦj need not be
feature space via a nonlinear mapping and to perform a computed; rather what is needed is the kernel function
linear regression in this feature space. that is very simple and has a known analytical form. The
D only condition required is that the kernel function has to
f ( x ) = ∑ wi φ i ( x ) + b (1) satisfy Mercer’s condition. Some of the mostly used
i =1 kernels include linear, polynomial, radial basis function,
and sigmoid. Note also that for Vapnik’s ε-insensitive loss
where w denotes the weight vector, b is a constant known
function, the Lagrange multipliers α i , α i are sparse, i.e.
*
as “bias”, {φ i ( x )}iD 1
= are called features. Thus, the
they result in nonzero values after the optimization (2)
problem of nonlinear regression in lower-dimensional input
only if they are on the boundary, which means that they
space is transformed into a linear regression in the high-
satisfy the Karush–Kuhn–Tucker conditions. The
2. 19th International Conference on Production Research
coefficients α i , α i are
*
obtained by maximizing the various data from yarn production process into
engineering database. The reasoning machines are a
following form: SVM-based yarn process simulator in nature, which are
1 M used to train the predictive models, and then make some
Max : R (α * ,α ) = − ∑(αi* − αi )(α *j − α j ) K ( xi , x j )
2 i , j =1
real-world process decision in term of the different raw
materials inputs
M M
− ε ∑ αi* +αi ) + ∑y i (αi* −αi )
( (6) 3.2 Model Selections
i=1 i=1 In the yarn predictive learning task, the appropriate model
M and parameter estimation method should be selected to
.S .T . ∑α
(
i=1
*
i α
− i)
(7)
obtain a high level of performance of the learning
machine. Lacking a priori information about the accuracy
0 ≤α , α ≤C
*
i
i
of the y-values, it can be difficult to come up with a
reasonable value of ε a prior. Instead, one would rather
specify the degree of sparseness and let the algorithms
Only a number of coefficients α i , α i will be different from
*
automatically compute ε from the data. This is the idea of
zero, and the data points associated to them are called ν-SVM, a modification of the originalε-SVM introduced by
support vectors. Parameters C and εare free and have to Schőlkopf, Smola, Williamson et al [6], which were used to
be decided by the user. Computing b requires a more construct the yarn predictive model in our study. Under
direct use of the Karush–Kuhn–Tucker conditions that the approach, the usually parameters to be chosen are
lead to the quadratic programming problems stated the following:
above. The key idea is to pick those values for a point xk the penalty term C which determines the tradeoff
between the complexity of the decision function and
on the margin, i.e. α k or α k in
*
the open interval (0, C). the number of training examples misclassified;
One xk would be sufficient but for stability purposes it is the sparsity parameter ν in accordance with the
noise that is in the output values in order to get the
recommended that one take the average over all points highest generalization accuracy.
Raw User Interface Yarn
Material Yarn Quality Prediction Properties
SVM-based Process Simulator
Reasoning Machines
Textile Engineering Database
Data Acquisition
Yarn Production Process
Fig.1 Yarn Quality Predictive Model Architecture
on the margin. More detailed description of SVM for the kernel function such that K ( x, y)
regression can be found in Ref. [3~6]
3 SVM BASED YARN PREDICTIVE MODEL According to the reference [7], the sparsity parameter ν
usually may be choose in the interval [0.3, 0.6], here
3.1 Model Architecture ν=0.583. And radial basis function (RBF) kernel, given by
Considering some salient features of SVM such as the Equitation 8 is used:
absence of local minima, the sparseness of the solution 2
K ( x, y ) = exp(− x − y / 2σ 2 ) (8)
and the improved generalization, there was proposed
SVM-based yarn quality prediction system (shown as where σ is the width of the RBF kernel parameter.
Fig.1). The system architecture mainly consists of three The RBF kernel nonlinearly maps samples into a higher
modules, i.e. data acquisition, reasoning machine, and dimensional space, so it, unlike the linear kernel, can
user interface. Among them, the user interface provides handle the case when the relation between inputs and
friendly interactive operation with the model, including outputs is nonlinear. In addition, the sigmoid kernel
data cleaning, model training, parameter selection, and so behaves like RBF for certain parameters. The reason
on. The data acquisition collects and transforms the
3. 19th International Conference on Production Research
using RBF kernels is the number of hyper-parameters and the width of the RBF kernel parameter σ. To optimize
which influences the complexity of model selection. The the two parameters, the “grid-search” method above was
polynomial kernel has more hyper-parameters than the applied in the present work. In fact, optimizing the model
RBF kernel. Finally, for the RBF kernel, it has less parameters need an iterative process which can
numerical difficulties; and a key point is 0 < k ( x, y ) < 1 continuously shrink the searching area and as a result,
obtain a satisfying solution. Table1 lists the final searching
in contrast to polynomial kernels of which kernel values area and optimal values of the four SVM models,
may go to infinity or zero while the degree is large. respectively.
Moreover, it is noted that the sigmoid kernel is not valid
(i.e. not the inner product of two vectors) under some After the completion of model development or training, all
parameters [4]. the models based on SVM (and ANN) were subjected to
the unseen testing data set. Statistical parameters such
3.3 Optimization of Model Parameter as the correlation coefficient between the actual and
Obviously, in the SVM model there are still two key predicted values (R), mean squared error, and mean error
parameters need choosing: C and σ. Unfortunately, it is %, were used to compare the predictive power of the
difficult to know beforehand which C and σ are the best SVM-based and ANN-based models. Results are shown
for one problem. Our goal is just about to identify good (C, in Table2. It has observed that for ANN models, the mean
σ) so that the model can accurately predict unknown data error (%) of three models is more than 10% except that
(i.e., testing data). Therefore, a common way is to the CV% remains about 5%, and the correlation
separate training data to two parts of which one is coefficient (R) of the CV% and EB models is very low,
considered unknown in training the model. Then the shown as 0.76 and 0.67 respectively. However, for SVM
prediction accuracy on data sets can more precisely models, the mean error (%) is less than 10% except that
reflect the performance on predicting unknown data. The the ED is still high, and the correlation coefficient (R) of all
procedure for improved model is called as cross- models is improved to more than 0.80. On the other hand,
validation. The cross-validation procedure can also the cases with over 10% error also decrease from 5 and 6
prevent the over-fitting problem furthermore. In this study, in ANN models to 2 and 3 in SVM models. In fact, among
the regression function was built with a given set of all four yarn properties considered in our work, end-down
parameters {C, σ}.The performance of the parameter set per 1000 spindle hours could be affected by different
is measured by the computational risk, here mean operators and observers [10], which data often result in
squared error (MSE, see Equation 9) on the last subset. undermining the prediction accuracy of various regression
The above procedure is repeated p times, so that each models. Even so, for ED, almost all statistical parameters
subset is used once for testing. Averaging the MSE over using SVM model seem to be much better than using
the p trials gives an estimate of the expected ANN model
p −1 5 CONCLUSIONS
generalization error for training on sets of size ⋅l , l Support vector machines are a new learning-by-
p example paradigm with many potential applications in
is the number of training data. science and engineering. The salient features of SVM
include the absence of local minima, the sparseness of
1 p q
MSE = ∑∑ ( yti( j ) − y (pij ) ) 2
pq j =1 i =1
(9)
the solution and the improved generalization. SVMs being
a relatively new technique, their application on textile
production have hitherto been quite limited. However, the
where q is the sample number of tested subset in the elegance of the formalism involved and their successful
use in diverse science and engineering applications
training set; y ti j ) and y (pij )
(
are the i th observed value and confirm the expectations raised in this appealing learning
from examples approach. In this study, we presented the
predicted value under j th tested subset, respectively. In
SVM model for predicting the yarn properties and
order to capture the better pairs of (C, σ), a “grid-search” compared with the BP neural network model. We have
[8] on C and σ is employed in this work. Firstly, in term of found that like ANN model, the SVM model is able to
possible range of the two parameters, C and σ were predict to a reasonably good accuracy in most of cases.
divided r pairs; then each pair of the parameters was tried And a more interested phenomenon is that in small data
using cross-validation and the one with the best cross- set and real-life production, the predictive power of ANN
validation accuracy was picked up as optimal parameters models appears to decrease, while SVM models are still
of the model. capable of remaining the stability of predictive accuracy to
4 THE EXPERIMENTS STUDY some extent. The experimental results indicate that the
SVM models are more suitable for noisy and dynamic
In this work, a small population (a total of twenty-six
spinning process. Of course, like other emerging industrial
different data samples) from real worsted spinning was
techniques, applied issues on SVM reaffirm the due
acquired. To demonstrate the generalization performance
commitment to their further development and
of SVM model, different experiments were completed and
investigation, such as the problems how to design the
comparisons with ANN models.To make problem more
kernel function and how to set the SVM hyper-parameters
simply, like most ANN models[2, 9], some fibre properties
(to make the industrial model development more easily).
and processing information were selected as the SVM
Our research thus far demonstrates that SVMs are able to
model’s inputs, which were mean fibre diameter (MFD,
provide an alternative solution for the spinners to predict
μm), diameter distribute (CVD, %), hauteur (HT, mm),
yarn properties more correctly and reliably
fiber length distribution (CVH, %), short fiber content
(SFC, %), yarn count (CT, tex), twist (TW, t.p.m), draft 6 ACKNOWLEDGMENT
ratio (DR), spinning speed (SS, r.p.m), traveler number This research was supported by national science
(TN). Four yarn properties, namely unevenness (CV %), foundation and technology support plan of the People
elongation at break (EB, %), break force (BF, cN) and Republic of China, under contract number 70371040 and
end-down per 1000 spindle hour (ED), served as the SVM 2006BAF01A44 respectively.
model’s outputs. 7 REFERENCES
One of the primary aspects of developing a SVM
regression model is the selection of the penalty term C
4. 19th International Conference on Production Research
[1] Renate Esswein, “Knowledge assures quality”, [7] Athanassia Chalimourda, B. Scholkopt, A. Smola,
International Textile Bulletin, 2004, Vol15, no2, “Experimentally Optimal ν in Support Vector
17~21, Regression for Different Noise Models and
[2] R. Chattonpadhyay and A. Guha, “Artificial Neural Parameter Settings”, IEEE trans. on Neural Netw.,
Networks: Applications to Textiles”, Textile Progress, 2004, Vol17, no2, 127-141
2004, Vol35, no1, 1~42, [8] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen
[3] V. David Sanchez A, “Advanced Support Vector Lin, A practical guide to support vector classification,
Machines and Kernel Methods”, Neurocomputing, available at http://www.csie.ntu.edu.tw/~cjlin/paper
2003, Vol55, no3, 5-20 , [9] Refael B., Lijing W., Xungai W., “Predicting worsted
[4] V. N. Vapnik, 1999, The Nature of Statistical Learning spinning performance with an artificial neural
Theory, 2nd ed., Berlin: Springer, 31-188, network model”, Textile Res. J. , 2004, Vol74, no.8,
757-763,
[5] B. Scholkopf, C. Burges, and A. Smola, 1999,
Advances in Kernel Methods—Support Vector [10] Peter R. Lord, 2003, Handbook of Yarn Production
Learning. Cambridge, MA: MIT Press, 5-73, (Technology, Science and Economics), Abinhton
England: Woodhead publishing Limited, 95-212
[6] B. Scholkopf, Smola A. and Williamson. R.C., et al,
“New support vector algorithms”, Neural
Computation, 2000, Vol12, no4, 1207-1245,
Table1 The optimal values of σand C
Output parameter Optimal value
CV % =
σC 0.973, = 1606
Elongation at break =
σC 0.016, = 14.55
Breaking force =
σC 0.012, = 101.19
Ends-down =
σC 0.287, = 2.975
Table2 Comparison of the predictive power of the SVM-based and ANN-based models
Predicted value using ANN model Predicted value using SVM model
Sample No.
CV% EB BF ED CV% EB BF ED
W21 19.32 13.81 113.89 70.41 19.66 12.85 116.24 72.06
W22 20.52 16.55 61.91 75.78 20.88 12.25 76.87 72.40
W23 15.62 12.32 153.46 39.40 16.84 15.59 156.57 42.22
W24 20.66 16.55 61.91 75.78 20.75 12.25 76.87 72.40
W25 22.60 19.77 47.00 69.84 19.66 12.76 76.86 59.31
W26 20.70 11.87 66.76 79.22 21.20 12.59 66.62 81.27
Correlation
0.76 0.67 0.96 0.88 0.88 0.80 0.99 0.91
coefficient. R
Mean squared error 0.01 0.12 0.07 0.03 0.003 0.05 0.01 0.03
Mean error% 5.73 24.35 13.67 19.99 2.85 9.23 5.52 17.29
Cases with
1 6 5 6 0 2 2 3
over 10% error