Comparison between the Plitt model and an artificial neural network in predicting hydrocyclone performance

School of Chemical and Mineral Engineering
CEMI479
Comparison between the Plitt model
and an artificial neural network in
predicting hydrocyclone separation
performance
Neil Zietsman
23379936
Supervisor: Mr. A.F. van der Merwe
North-West University
Potchefstroom Campus
Date of submission:
26 October 2015

School of Chemical and Minerals Engineering
Declaration| i
Declaration
L.N. Zietsman 23379936, hereby declare that:
 the text and references of this study reflect the sources I have consulted and
 sections with no source references are my own ideas, arguments and/or conclusions.
This declaration is for the report entitled CEMI479: Comparison between the Plitt model
and an artificial neural network in predicting hydrocyclone separation performance
submitted for the partial fulfilment of the requirements for the B.Eng. Chemical Engineering
degree at the North-West University, Potchefstroom Campus.
Signed at Potchefstroom on the day of ______ October 2015.
_______________________
L.N. Zietsman 23379936

Acknowledgements| ii
Acknowledgements
I would like to thank the following people for their help during the year with my project:
 My God for giving me strength during the year to complete this project
 Mr. A.F. van der Merwe, my study leader for his help and guidance.
 Workshop personnel for their help with regard to the technical problems that occurred
during the course of the year.
 Mrs. Sanet Botes for her help with ordering the needed items in the project
 Miss. Sarita van Loggenberg, my colleague, who helped me perform the hydrocyclone
experiments.
 Mr. Nico Lemmer for his help on the Malvern Mastersizer 2000

Abstract| iii
Abstract
The hydrocyclone is an invaluable process unit which is popular for its use in the mineral
processing industry. As all classifiers, the hydrocyclone is not capable of perfect separation.
The ability of the hydrocyclone to separate particles into the correct streams could be
represented by a curve, known as a partition curve.
Two important variables could be obtained from the partition curve – the cut size, d50c, and the
sharpness of separation. These two variables could be used to fully describe the separation
efficiency of the hydrocyclone. Optimal control of the hydrocyclone could be achieved if
accurate values of the d50c and sharpness of separation could be obtained.
Unfortunately, this is easier said than done. On-line instrumentation for direct analysis of these
variables are unheard of. Additionally, the complex flow inside the hydrocyclone makes it
impossible to determine these variables indirectly through first principle calculations. The
solution is inference sensors, which make use of easily measured variables, like the flowrate
and solids percentage to determine the d50c and sharpness of separation.
Two methods of inference sensoring was covered in this study, namely an empirical method
(Plitt model) and an artificial neural network.
The modified Plitt model was specifically used in this case where its fudging factors were
changed to fit experimental data. The Plitt model was only capable of predicting the d50c to a
certain extent, but failed to predict the sharpness of separation.
The artificial neural network was trained with the backpropagation algorithm. The more input
variables the artificial neural network had, the better its predicting capability became. The
addition of regularization and momentum terms further increased the prediction power of the
neural network.
Keywords: hydrocyclone; d50c; sharpness of separation; artificial neural network; Plitt model,
fine cut point, variable size spigot

Attached documents| iv
Attached documents
Folder name File name Description
Experimental
error
Experimental Error
Excel® spreadsheet containing the data
and calculations that were done to
determine the experimental error
Plitt model Plitt model
Excel® spreadsheet containing the
calculations performed on the experimental
data with the Plitt model
Artificial neural
networks
Neil'sANN.rev3-d50c -
Du;
Du+phi+Q;
Du+phi+Q+P+S;
Neil'sANN.rev3-m - Du;
Neil'sANN.rev3-m -
Du+phi+Q;
Neil'sANN.rev3-m -
Du+phi+Q+P+S
The macro enabled Excel® spreadsheets
contain the program with which the artificial
neural networks were trained and validated
Meetings Various files
This folder contains all the minutes and
agendas of each meeting in Microsoft
Word® format
Data
processing
Data processing
Contains the Excel® spreadsheet with
which the data processing was done
MSDS MSDS – Silica flour
This is a PDF document containing the
MSDS of silica flour
Gantt chart Gantt chart
This folder contains a Gantt chart that is
both in PDF format and MS Project format

Table of contents| v
Table of contents
Declaration............................................................................................................................. i
Acknowledgements................................................................................................................ii
Abstract.................................................................................................................................iii
Attached documents .............................................................................................................iv
Table of contents .................................................................................................................. v
List of figures .......................................................................................................................vii
List of tables..........................................................................................................................ix
List of acronyms.................................................................................................................... x
List of symbols ...................................................................................................................... x
Chapter 1 - Introduction ........................................................................................................ 1
1.1 Background............................................................................................................. 1
1.2 Problem statement.................................................................................................. 1
1.3 Aim and objectives.................................................................................................. 2
1.3.1 Aim .................................................................................................................. 2
1.3.2 Objective.......................................................................................................... 2
1.3.3 Methodology .................................................................................................... 2
Chapter 2 - Literature study................................................................................................... 3
2.1 The hydrocyclone ................................................................................................... 3
2.2 Hydrocyclone control .............................................................................................. 6
2.2.1 Sensors used in hydrocyclone performance determination .............................. 6
2.3 Soft sensors............................................................................................................ 7
2.3.1 Empirical models ............................................................................................. 8
2.3.2 Artificial neural networks................................................................................ 10
Chapter 3 - Experimental procedure ................................................................................... 20
3.1 Overview............................................................................................................... 20

Table of contents| vi
3.2 Raw materials....................................................................................................... 20
3.3 Equipment ............................................................................................................ 20
3.4 Experimental setup ............................................................................................... 20
3.5 Experimental procedure........................................................................................ 23
3.5.1 Preparation.................................................................................................... 23
3.5.2 Sampling........................................................................................................ 25
3.5.3 Analysing....................................................................................................... 26
3.5.4 Experimental error ......................................................................................... 27
Chapter 4 - Model development.......................................................................................... 30
4.1 Overview............................................................................................................... 30
4.2 The Plitt model...................................................................................................... 30
4.2.1 Split flow ........................................................................................................ 30
4.2.2 Cut size – d50c ................................................................................................ 31
4.2.3 Sharpness of separation ................................................................................ 31
4.3 The artificial neural network .................................................................................. 31
4.3.1 Artificial neural network architecture .............................................................. 31
Chapter 5 - Results and discussion..................................................................................... 34
5.1 Deviations in the feed PSD ................................................................................... 34
5.2 Plitt model............................................................................................................. 36
5.2.1 Cut size – d50c ................................................................................................ 36
5.3 Artificial neural networks ....................................................................................... 39
5.3.1 Cut size – d50c ................................................................................................ 40
Chapter 6 - Conclusion and recommendations.................................................................... 53
6.1 Conclusion............................................................................................................ 53
6.2 Recommendations................................................................................................ 53
6.3 Further study......................................................................................................... 54

List of figures| vii
Bibliography........................................................................................................................ 55
Appendix A Data processing ............................................................................................ I
Appendix B Data processing source code ......................................................................IV
Appendix C Processed data ...................................................................................... XVIII
Appendix D Experimental error data............................................................................ XXI
Appendix E ANN source code .................................................................................... XXII
Appendix F ECSA exit level outcomes ..................................................................... XXXII
Appendix G Hazard identification and risk assessment.............................................XXXV
List of figures
Figure 2.1: Hypothetical flow inside the hydrocyclone viewed from the top of the hydrocyclone.
Adapted from Plitt (1976) ...................................................................................................... 4
Figure 2.2: Corrected and non-corrected partition curve adapted from Schneider (2001)...... 5
Figure 2.3: Diagram of the computational nodes and weights of an artificial neural network
adapted from Jain (1996).................................................................................................... 10
Figure 2.4: Supervised learning with reference to Hagan et al. (2002) ................................ 13
Figure 2.5: Polynomial of first order produces a bad fit for the data. Reproduced from Bishop
(2008:11) ............................................................................................................................ 14
Figure 2.6: Polynomial of high order producing something that looks like a good fit for all the
data points, but the predictive power of the polynomial is sacrificed. Reproduced from Bishop
(2008:12). ........................................................................................................................... 15
Figure 2.7: A lower order polynomial that has the capability to generalize well ................... 16
Figure 3.1: Diagram of the hydrocyclone setup ................................................................... 21
Figure 3.2: The experimental hydrocyclone setup............................................................... 22
Figure 3.3: The Marcy scale................................................................................................ 24
Figure 3.4: Malvern Mastersizer 2000................................................................................. 26
Figure 3.5: Experimental error of the d50c with a 95% confidence interval.......................... 29
Figure 3.6: Experimental error of the sharpness of separation with a 95% confidence interval
........................................................................................................................................... 29

List of figures| viii
Figure 4.1: Experimental split flow values plotted with the predicted Plitt model split flow values
........................................................................................................................................... 31
Figure 4.2: Learning capability of one of the 6 developed artificial neural networks............. 33
Figure 5.1: Particle size distribution of 25 different feed samples ........................................ 34
Figure 5.2: Example partition curve before justifications...................................................... 35
Figure 5.3: Experimental vs. adjusted values of Rf .............................................................. 36
Figure 5.4: Experimental cut point plotted with the cut point predicted by the Plitt model .... 37
Figure 5.5: Plitt model predicted cut size vs. experimental cut size plotted over the y=x curve
........................................................................................................................................... 37
Figure 5.6: Experimental sharpness of separation plotted with the sharpness of separation
predicted by the Plitt model................................................................................................. 38
Figure 5.7: Plitt model predicted m vs. experimental m plotted over the y=x curve.............. 39
Figure 5.8: Results of neural network 2 trained with a training speed of 0.2 and a maximum
amount of epochs of 8000................................................................................................... 40
Figure 5.11: Calculated d50c plotted with the experimental d50c values of neural network 3
trained with a maximum of 60000 epochs and a training speed of 0.02 .............................. 42
Figure 5.12: Predicted d50c vs. experimental d50c plotted over the y=x curve for the neural
network trained with a maximum of 60000 epochs and a training speed of 0.02 ................. 43
Figure 5.13: Calculated vs. experimental values of neural network 3 trained with a maximum
of 60000 epochs and a training speed of 0.02 with the addition of the momentum term...... 44
Figure 5.14: Predicted d50c vs. experimental d50c plotted over the y=x curve for a neural network
of 60000 epochs and a training speed of 0.02 with the addition of the regularization term .. 45
Figure 5.16: Predicted d50c vs. experimental d50c plotted on the y=x curve for the neural
network trained with a maximum of 60000 epochs and a training speed of 0.02 with the
addition of the regularization term ....................................................................................... 46

List of tables| ix
amount of epochs of 20000................................................................................................. 47
amount of epochs of 25000................................................................................................. 47
Figure 5.20: Experimental and predicted values of neural network 6 trained with a maximum
of 60000 epochs and a training speed of 0.02..................................................................... 49
Figure 5.21: Predicted vs. experimental m plotted over the y=x graph for neural network 6
Figure 5.22: Predicted and experimental values of neural network 6 trained with a maximum
of 60000 epochs and a training speed of 0.02 with the addition of the momentum term...... 50
Figure 5.23: Predicted m vs. experimental values m plotted over the y=x line for neural network
6 trained with a maximum of 60000 epochs and a training speed of 0.02 with the addition of
the momentum term............................................................................................................ 50
Figure 5.24: Predicted vs. experimental values of neural network 6 trained with a maximum of
60000 epochs and a training speed of 0.02 with the addition of the regularization term ...... 51
Figure 5.25: Predicted m vs. experimental m plotted over the y=x curve for neural network 6
trained with a maximum of 60000 epochs and a training speed of 0.02 with the addition of the
regularization term .............................................................................................................. 51
List of tables
Table 2.1: Sensors used in the on-line monitoring of hydrocyclone performance .................. 7
Table 3.1: Processed data used for the experimental error determination........................... 27
Table 3.2: Values for substitution into the student's t equation ............................................ 28
Table 4.1: Different artificial neural networks that were programmed .................................. 32

List of acronyms| x
List of acronyms
Acronym Description
ANN Artificial neural network
HIRA Hazard identification and risk assessment
MS Microsoft
MSDS Material safety data sheet
PPE Personal protective equipment
PSD Particle size distribution
List of symbols
Symbol Description
Al2O3 Aluminium oxide
K2O Potassium oxide
Fe2O3 Iron(III) oxide
CaO Calcium oxide
Na2O Sodium oxide
𝑑 Size of a particle in 𝜇𝑚.
𝑑50𝑐 Hydrocyclone corrected cut point. This is the particle size that has an
equal chance of either leaving through the underflow or the overflow. Its
unit is in 𝜇𝑚.
𝐷𝑐 Hydrocyclone diameter in 𝑐𝑚
𝐷𝑖 Inlet diameter in 𝑐𝑚
𝐷 𝑜 Vortex finder diameter in 𝑐𝑚

List of symbols| xi
𝐷 𝑢 Underflow/apex/spigot diameter in 𝑐𝑚
𝛿𝑗
𝑂 Error of the output node 𝑗.
𝛿𝑗
𝐻 Error of the hidden node 𝑗.
𝐸 Error of a neuron’s output
𝐸̃ Conditioned error of the neuron’s output
𝜂 𝑣 Viscosity of the carrier fluid in 𝑐𝑝
𝐹𝑖 Fudging factor of the modified Plitt model where 𝑖 = 1,2,3 …
ℎ Free vortex height in 𝑐𝑚
𝑘 Constant that takes into account the effect of the solids density on the
corrected cut size.
𝑚 Sharpness of separation. This is the slope of the partition curve that
indicates how well the classification is taking place inside the
hydrocyclone. The higher the value of m, the closer the hydrocyclone will
be to an ideal classifier.
𝑀 Momentum factor defined by the user
𝑚 𝑠 Mass of silica sand that has to be added to the storage tank in 𝑘𝑔
𝑛 The amount of weights attached to node 𝑗
𝑛𝑢 Number of data points available in the set
Ω Penalty term
𝑃 Pressure over the hydrocyclone in 𝑘𝑃𝑎
𝜑 Percentage solids in the feed
𝑄 Volumetric feed flow rate in
𝑙𝑖𝑡𝑒𝑟𝑠
𝑚𝑖𝑛𝑢𝑡𝑒
𝑅 Regularization factor defined by the user

List of symbols| xii
𝑅𝑓 Recovery of the carrier liquid to the underflow
𝜌 𝑝 Density of the hydrocyclone feed slurry in
𝑔
𝑐𝑚3
𝜌𝑠 Density of the solid phase in
𝑔
𝑐𝑚3
𝑆 Split flow – The volumetric flow of the underflow divided by the volumetric
𝑆𝑡 Standard deviation in the data
𝜎𝑗
𝐻 Output value of the transfer function of node 𝑗 in the hidden layer
𝜎𝑗
𝑂 Output of the neuron in the output layer 𝑗.
𝑡 𝑛−1(
𝛼
2
) Critical t value that could be obtained from the back cover of Devore and
Farnum (2005)
𝑇(95%) Critical t value for a 95% confidence.
𝜐 Parameter for controlling the importance of the bias term
𝑉𝑤 Volume of water in the storage tank in 𝑚3
𝑤𝑖 Value of weight 𝑖, where 𝑖 = 1,2,3 …
𝑤𝑖𝑗 Value of the weight that goes from node 𝑖 to node 𝑗.
𝑤𝑖𝑗
𝐻 Value of the hidden layer weight that goes from node 𝑖 to 𝑗.
∆𝑤𝑖𝑗
𝐻
Value with which weight 𝑤𝑖𝑗
𝐻
has to be updated
𝑤𝑖𝑗
𝐻
(𝑡 − 1) The value of the previous weight 𝑤𝑖𝑗
𝐻
𝑤𝑖𝑗
𝑂 Value of the output weight that goes from node 𝑖 to node 𝑗.
∆𝑤𝑖𝑗
𝑂
Value with which the output weight, 𝑤𝑖𝑗
𝑂
has to be updated
𝑤𝑖𝑗
𝑂
(𝑡 − 1) Value of the previous weight, 𝑤𝑖𝑗
𝑂
𝑋 Average of a data set

List of symbols| xiii
𝑥𝑖 Input value from weight 𝑖.
𝑥𝑗
𝐻 Output of the hidden node 𝑗.
𝑥𝑗
𝑂 Output of the output node 𝑗.
𝜉𝑗 Signal sent to node 𝑗, where 𝑗 = 1,2,3 …
𝑦 Partition number. This is the value displayed on the partition curve’s y-
axis for a certain particle size, 𝑑.
𝑦′ Corrected partition number
𝑦𝑗 Desired output of output node 𝑗
𝑧 Amount of weights connected to the cell

Introduction| 1
Chapter 1 - Introduction
1.1 Background
Hydrocyclones are very handy process units when it comes to classifying particles according
to size or density in the mineral processing industry. Unfortunately, it is very difficult for the
operator to monitor the performance of the hydrocyclone while on-line (Coelho & Medronho,
2000). Empirical models had to be developed in order to predict how the hydrocyclone will
perform under certain conditions by acting as inference sensors (Kraipech et al., 2005).
One popular empirical model used to predict the performance of a hydrocyclone is the Plitt
model (Flinthoff et al., 1987). L.R. Plitt developed this model to be robust by gathering a large
number of experimental data. The data was gathered by operating a wide range of
hydrocyclone geometries at different operating conditions (Plitt, 1976). This model is in general
not very accurate in predicting the performance, i.e. the separation efficiency of hydrocyclones
(Silva et al., 2009).
Artificial Neural Networks can be used to predict the performance of complex systems like the
hydrocyclone (Kutz, 2003). What makes this method so special is its ability to learn through
parallel processing (McMillan, 1999). Given a certain amount of experimental data, the ANN
can identify underlying patterns in the data which gives it the ability to predict the outcome,
given certain input parameters (Jain, 1996).
South Africa has a very large mining industry (Anglo American Platinum, 2013; Anglo Gold
Ashanti, 2013). Improving the performance of the hydrocyclone could possibly lead to the
growth in the South African economy, as mineral processing becomes more efficient. The use
of hydrocyclones are not limited to the mining industry. These process units are also globally
used in the petrochemical, environmental and food processing industries (Sripriya et al.,
2007). The improved use of the hydrocyclone could thus have a large impact globally in
various industries.
1.2 Problem statement
Inadequate control of the hydrocyclones on a mineral processing plant may lead to
inefficiencies in the downstream process units, ultimately leading to a loss in profit for the
company. Monitoring the on-line performance of a hydrocyclone is not a simple task. Inference
sensors1
that make use of empirical models or artificial neural networks are possible solutions
1 The terms inference sensors and soft sensors are used interchangeably in this study

Introduction| 2
to this problem. A study is needed to determine which of these methods will be more
appropriate for predicting hydrocyclone performance.
1.3 Aim and objectives
1.3.1 Aim
Improve hydrocyclone efficiency by producing a soft sensor that has the ability to accurately
predict hydrocyclone performance.
1.3.2 Objective
Compare the predictive power of an empirical model, namely the Plitt model, with the
predictive power of an artificial neural network trained with the backpropagation algorithm by
making use of experimental hydrocyclone data.
1.3.3 Methodology
 Do a literature study on the operation of the hydrocyclone, empirical models for
predicting hydrocyclone performance and artificial neural networks;
 Do a HIRA study before sampling on the hydrocyclone commences;
 Devise a procedure for obtaining representative samples from the hydrocyclone;
 Obtain more than 100 samples from the hydrocyclone;
 Gather data on the samples’ PSD by analysing the samples with the Malvern
Mastersizer 2000;
 Process the data for it to be in a suitable form for inserting into an empirical model and
an artificial neural network;
 Develop the artificial neural network from the literature study that was previously
conducted;
 Find optimal architectures and parameters for the artificial neural network through trial-
and-error;
 Substitute the processed data into the empirical model and compare its output (d50c
and sharpness of separation) with that of the experimental data;
 Substitute the processed data into the trained artificial neural networks and compare
the output to the experimental data;
 Compare the two soft sensors, the empirical model and the artificial neural network,
with each other and come to a conclusion over which is better for predicting
hydrocyclone performance

Literature study| 3
Chapter 2 - Literature study
2.1 The hydrocyclone
Hydrocyclones are commonly used in the mineral industry for the classification of particles
after grinding (Flinthoff et al., 1987). It is usually installed in a closed circuit grinding unit where
it is used to separate the under size particles from the course particles (Kelly & Spottiswood,
1982:201). The course particles are returned to the grinder for further comminution while the
under size particles leave the circuit (Wills, 2006:224-225). Advantages of hydrocyclones
include simple design, low operational costs and the capability of handling large volumes of
pulp (Sripriya et al., 2007). Complex mechanical devices like spirals and rake classifiers have
been replaced by cyclones2
, due to their simple structure that contains no moving parts
(Napier-Munn et al., 2005:309).
Its applications are however not limited to the mineral industry as it is also used in the chemical
industry, power generation industry, textile industry and more. By customising its structure,
the hydrocyclone can be used for specific applications like (Svarovsky, 1984:1):
 Liquid clarification
 Slurry thickening
 Cleansing solid particles
 Elimination of gasses from liquids
Classification of particles takes place due to the difference in settling velocities of the particles
being classified. The settling velocities can be a function of either particle size and/or particle
density, depending on whether a homogeneous or heterogeneous ore is classified (Kelly &
Spottiswood, 1982:199). A homogeneous ore contains particles of similar densities. Particles
of homogeneous ores will be classified according to their size (Flinthoff et al., 1987).
The feed enters the cylindrical section of the hydrocyclone tangentially where it forms a vortex
inside the cyclone’s cone shaped body. The fluid follows a helical path until it reaches the
spigot, also known as the apex, where a portion of the downward flow leaves through the
spigot as the underflow. The remaining downward flow follows an upward spiral, located on
the inside of the outer vortex, and leaves via the vortex finder (Svarovsky, 1984:30-31). The
reason for the formation of the upward spiral is not fully understood (Svarovsky, 1984:41).
Particles of similar density or size gather together due to the competition between the drag
forces and centrifugal forces acting on these particles (Napier-Munn et al., 2005:309-310). If
the density of the carrier liquid is lower than that of the solids being separated, the centripetal
2The terms, “hydrocyclone” and “cyclone” are used interchangeably in this study

Literature study| 4
force on the solid particles will be larger than the centripetal force of the liquid. On the other
hand, the centripetal force acting on the particle will increase as the particle size increases for
homogeneous ores (Hibbeler, 2010:131). The centripetal force acting on the particles
dominates the drag force also acting on the particles in a radial direction. Larger particles thus
reach the boundary layer, formed between the liquid and the wall of the cyclone, with more
ease than the smaller particles. The particles in the boundary layer leave the cyclone via the
apex under ideal conditions. The finer particles that could not reach the boundary layer by the
time the apex is reached, is transported to the inner spiral where it leaves throught the vortex
finder (Svarovsky, 1984:41).
Random turbulence, hindered settling and the interaction between the carrier liquid and the
solid particles makes describing the flow inside the hydrocyclone very difficult. Determining
the separation performance of the hydrocyclone is thus not an easy task (Sripriya et al., 2007).
The performance of the hydrocyclone is defined as the ability to separate particles into the
desired size ranges (Kelly & Spottiswood, 1982:204). According to Svarovsky (1984) the
separation performance of the hydrocyclone could be determined if the corrected cut size,
𝑑50𝑐, and the sharpness of separation, m, could be calculated. This is done with the use of a
corrected partition curve. The grade efficiency curve, also called the partition curve or the
Tromp curve, is a plot of the particles in a certain size range, on the x-axis, vs. the fraction of
Figure 2.1: Hypothetical flow inside the hydrocyclone viewed from the top of the hydrocyclone.
Adapted from Plitt (1976)

Literature study| 5
these particles in the feed leaving the hydrocyclone through the underflow (Frachon & Cilliers,
1999) as can be seen on Figure 2.2. The grade efficiency curve cannot be approximated from
first principles and has to be determined by using experimental data (Svarovsky, 1984:17).
Figure 2.2: Corrected and non-corrected partition curve adapted from Schneider (2001)
The 𝑑50𝑐, also known as the cut size, is the particle size that has an equal chance to exit the
hydrocyclone through the vortex finder or through the underflow. The corrected cut size is
used instead of the real cut size, as this gives a better indication of the separation forces that
are present in the hydrocyclone. More information on the corrected partition curve follows
later. The sharpness of separation, m, indicates how well the classification is taking place in
the cyclone. The higher the value of m, the closer the hydrocyclone is to an ideal classifier
(Napier-Munn et al., 2005:311).
In practice, some of the particles, irrespective of their size, in the hydrocyclone bypasses the
classification. By controlling the operating conditions of the cyclone, these deviations from
ideal separation could be lowered, but never eliminated (Napier-Munn et al., 2005:310-311).
Two paths that could be followed for bypassing classification are mentioned below.
Small particles tend to stay suspended in the liquid which leaves the hydrocyclone through
the underflow. According to Frachon and Cilliers (1999), Plitt (1976) and Svarovsky (1984:20)
the fraction small particles bypassing to the underflow is directly proportional to the liquid
recovery to the underflow, Rf. A corrected partition curve is constructed to remove the effect
of the bypass to the underflow as can be seen on Figure 2.2. Another phenomenon that

Literature study| 6
causes the undersize particles to leave though the underflow, is when the undersize particles
are trapped in the boundary layer by the larger particles. The corrected partition curve
constructed with the use of equation 2.1 might thus not be capable of taking into account all
of the undersize particles leaving via the underflow.
Another way in which classification could be bypassed is if particles near votex finder leaves
via the overflow (Svarovsky, 1984:40). No corrections are made on the partition curve to take
this effect into account, but this effect will however be held in mind when the results are
interpreted.
𝑦′
=
𝑦 − 𝑅𝑓
1 − 𝑅𝑓
2.1
2.2 Hydrocyclone control
If a hydrocyclone is not operated to produce the desired overflow and underflow, it could lead
to poor performance in downstream processes (Eren & Gupta, 1988). Fines in the underflow
lead to overgrinding, while coarse material in the overflow can cause downstream separation
problems (Aldrich et al., 2014). Slight changes in the operating conditions of the hydrocyclone
could markedly affect the performance of the hydrocyclone (Neesse et al., 2004). The operator
of a hydrocyclone might not always be aware of the cyclone’s underperformance and is in
addition frequently incapable of returning the cyclone to its optimal operation. There is thus a
need for methods to efficiently determine the performance of the cyclone while in operation
(Napier-Munn et al., 2005:309). Optimising the hydrocyclone is not an easy task, as the
variables are often interlinked with each other. Models that are reasonably accurate are
capable of finding the optimum operating conditions for the hydrocyclone even if the variables
like the split flow and pressure are for example dependent on each other (Napier-Munn et al.,
2005:320).
2.2.1 Sensors used in hydrocyclone performance determination
Variables like the pressure drop over the cyclone, the flow rates in and out of the cyclone and
the feed are commonly monitored while the cyclone is on-line. With all this information, the
operator might still not be able to control the performance of the cyclone effectively (Napier-
Munn et al., 2005; Aldrich et al., 2014). Numerous studies have been conducted to find a
suitable method to control the performance of the hydrocyclone, many of which has not been
widely used in the industry (Aldrich et al., 2014). Table 2.1 contains a list of current sensors
that have been developed for determining hydrocyclone performance.

Literature study| 7
2.3 Soft sensors
One other way the performance can be monitored on-line is by developing a soft sensor, like
an artificial neural network (Napier-Munn et al., 2005). Soft sensors use operational data from
the plant to predict variables that are usually difficult and/or costly to measure on-line (Kadlec
Table 2.1: Sensors used in the on-line monitoring of hydrocyclone performance
Sensor Description
Acoustic Sensors An acoustic sensor was mounted externally on the
hydrocyclone and after a suitable model was found, could
accurately predict various parameters like the solids
concentration and the flow rate. Variables like the d50c and
sharpness of separation are however not determined with the
use of this method (Hou et al., 1998).
Videographic
Measurement
A video camera was used to monitor the discharge angle of
the hydrocyclone. The discharge angle of the cyclone is said
to be linked to the performance of the cyclone (Concha et al.,
1996; Neesse et al., 2004). Although this method has some
challenges, it is a cost effective way to determine the discharge
angle with good accuracy (Janse van Vuuren et al., 2011).
Photographic
measurement
Aldrich et al. (2014) used images and other experimental data
from the underflow of a experimental hydrocyclone setup to
develop a model that had the ability to identify the mean
particle size in the underflow. Instead of using the discharge
angle of the underflow like Janse van Vuuren et al. (2011), the
textural information that the images provided of the underflow
was utilised.
Measurement using a
laser beam
A laser beam is pointed at the underflow of the cyclone where
the reflection of the laserbeam is measured with a camera to
determine if the cyclone is in the spray or roping state (Neesse
et al., 2004).

Literature study| 8
et al., 2009). Two possible soft sensors for the control of the hydrocyclone, empirical models
and artificial neural networks, will be discussed in this study.
2.3.1 Empirical models
Empirical models had to be developed in order to predict how the hydrocyclone will perform
under certain conditions (Kraipech et al., 2005). Although Flinthoff et al. (1987) states that
these models have been widely accepted, Chen et al. (2000) however states in his study that
these models are not reliable. Coelho and Medronho (2000), reasons that these models will
only work well if the cyclone is operated in the range that was used to obtain the data to fit the
models.
One popular empirical model that is used to predict the performance of a hydrocyclone, is the
Plitt model (Flinthoff et al., 1987). L.R. Plitt developed this model to be robust by gathering a
large number of experimental data. The Plitt model was designed to also take into account the
theories around the complex flow of the hydrocyclone (Plitt, 1976). These theories include the
residence time theory and the equilibrium orbit theory (Chen et al., 2000). The theories alone
are incapable of describing the hydrocyclone performance (Napier-Munn et al., 2005:312).
The data was gathered by operating a wide range of hydrocyclone geometries at different
operating conditions (Plitt, 1976). This model is in general not very accurate in predicting the
performance of hydrocyclones (Silva et al., 2009).
The Plitt model consists of four empirical equations. These equations are used to calculate
the corrected cut size, the flow split between the underflow and overflow, the sharpness of
separation and the pressure drop over the hydrocyclone (Plitt, 1976). Although the Plitt model
is designed to work without calibration, Flinthoff et al. (1987) recommends inserting empirical
constants, F1 – F4, that will take into account the unique conditions under which the cyclone
operates. Only one experimental data point is needed to tune these empirical constants. By
default, the values of these constants are all equal to 1.
𝑑50𝑐 = 𝐹1
39.7𝐷𝑐
0.46
𝐷𝑖
0.6
𝐷0
1.21
𝜂 𝑣
0.5
exp(0.063𝜑)
𝐷 𝑢
0.71
ℎ0.38 𝑄0.45 (
𝜌𝑠 − 1
1.6 )
𝑘
2.2
𝑚 = 𝐹21.94 exp (−
1.58𝑆
1 + 𝑆
) (
𝐷𝑐
2
ℎ
𝑄
)
0.15 2.3

Literature study| 9
𝑃 = 𝐹3
1.88𝑄1.78
exp(0.0055𝜑)
𝐷𝑐
0.37
𝐷𝑖
0.94
ℎ0.28(𝐷 𝑢
2
+ 𝐷 𝑜
2)0.87
2.4
𝑆 =
𝐹4 (3.29𝜌 𝑝
0.24
(
𝐷 𝑢
𝐷 𝑜
)
3.31
ℎ0.54(𝐷 𝑢
2
+ 𝐷 𝑜
2)0.36
𝑒0.0054𝜑
)
𝐷𝑐
1.11
𝑃0.24
2.5
Where:
𝐷𝑐= Cyclone diameter in 𝑐𝑚
𝐷𝑖= Inlet diameter in 𝑐𝑚
𝐷 𝑜= Vortex finder diameter in 𝑐𝑚
𝐷 𝑢= Underflow/apex diameter in 𝑐𝑚
ℎ= Free vortex height in 𝑐𝑚
𝜌 𝑝= Density of the cyclone feed slurry in
𝑔
𝑐𝑚3
𝜌𝑠= Density of the solid phase in
𝑔
𝑐𝑚3
𝜂 𝑣= Viscosity of the carrier fluid in 𝑐𝑝
𝜑 = Percentage solids in the feed
𝑄 = Feed flow rate in
𝑙𝑖𝑡𝑒𝑟𝑠
𝑚𝑖𝑛𝑢𝑡𝑒
𝑑50𝑐 = Corrected cut size in 𝑚𝑖𝑐𝑟𝑜𝑛𝑠
𝑚 = Sharpness of separation which is dimensionless
𝑃 = Gauge pressure in 𝑘𝑃𝑎
𝑆 = Split flow. This is the volume of the underflow divided by the volume of the overflow and it
is a dimensionless quantity
According to Plitt (1976), the PSD of the feed slurry has a negligible effect on the outcome of
the d50c of the underflow.
After determining the d50c and m, with the Plitt model, these values can then be inserted into
the Rosin-Rammler equation, equation 2.6, to obtain the corrected partition curve.

Literature study| 10
𝑦′
= 1 − exp⁡(−0.693 (
𝑑
𝑑50𝑐
)
𝑚
)
2.6
Where 𝑑 is the particle size in 𝑚𝑖𝑐𝑟𝑜𝑛𝑠 and 𝑦′ is the corrected volume of a certain particle size
that was recovered in the underflow.
2.3.2 Artificial neural networks
Various studies have been done on the use of artificial neural networks for the prediction of
hydrocyclone performance and have proven to be successful (Eren et al., 1997a; Eren et al.,
1997b; Karimi et al., 2010)
The human brain has powerful learning, generalization and parallel computing abilities. It is
desired to give computers the same abilities by copying the principle operation of brain cells
and developing artificial neural networks (ANN) (Jain, 1996). ANNs are not limited to soft
sensors. Awodele and Jegede (2009) reasons that ANN promises a wide range of new
applications in the areas such as education and medicine in the future. This is the reason why
research in this field has been booming in the past few decades (Gallant, 1994:1).
Figure 2.3: Diagram of the computational nodes and weights of an artificial neural network
adapted from Jain (1996)

An artificial neural network consists of computational units called nodes3
. These nodes are
located in sets called layers. Connections, called weights, connects the nodes of one layer to
the following layer. Information transported through the weights can only travel in one
direction. Figure 2.3 illustrates the computational nodes in a three-layer neural network4
. The
arrows represent the weights and their direction.
Values, either positive or negative, are assigned to each of the weights. The magnitude of the
value assigned to the weight determines how large the effect of the data transported through
that weight will be on the neural network. The larger the magnitude of the weights, the larger
the effect. The input data travels through the weights to which they are connected. The data
traveling through that weight is multiplied by the value of that weight. When the data reaches
hidden layer 1 through the weights, an input value to the node is calculated. More information
on these calculations later. The input value is substituted into a function called an activation
function which calculates an output called an activation. The activation travels through the
weights to the next nodes and the same operation is performed. This is done until the ANN
produces its final output (Gallant, 1994:1).
Parameters that influence the output of the network include:
 The number of layers
 The number of nodes in the hidden layers
 The activation function used in the nodes
 The values of the weights
 Number of input variables
2.3.2.1 Network topography
The network topology involves the arrangement of nodes and connections in the network.
These arrangements can be classified into 2 main categories: Feed-forward networks or
feedback networks.
In feed-forward networks, information can only be carried in one direction, from the input to
the output. This type of network is mainly used for pattern recognition purposes. Figure 2.3
illustrates a feed-forward neural network.
In a feedback or recurrent networks, the information can either travel in the forward direction
to the output or return in the input direction, i.e. make a loop (Awodele & Jegede, 2009).
For the purposes of this study, a feed forward structure will be used.
3 The words “nodes” and “neurons” are used interchangeably in this study.
4 The terms “neural networks” and “artificial neural networks” are used interchangeably in this study.

2.3.2.2 Other artificial neural network parameters
2.3.2.2.1 Initial weights
Initial weight values between -0.1 and 0.1 are randomly chosen. Assigning non-random
weights could lead to weights that perform the same action and does not lead to sufficient
convergence. The weights need to be unique when initialising training to increase the chances
of identifying the pattern in the data (Gallant, 1994:213). Another, more complex approach
proposed by Gallant (1994:220), is to initialise the weights connected to a certain cell to a
random value between -2/z and 2/z, where z is the amount of weights connected to the cell.
2.3.2.2.2 Training speed
A large value for the training speed, 𝜇, gives a faster convergence. This convergence can
however only be maintained up to a certain point where the network will become unstable and
diverge. This is called overtraining. It is advised to choose a training speed that has a positive
value no larger than 0.1. Although this results in slow training, the neural network has a better
chance to find the local minimum (Gallant, 1994:220).
2.3.2.2.3 Momentum
Momentum is used to increase the training speed. The momentum term consists of the
change in weight at the previous iteration, multiplied by the momentum parameter. An
additional benefit of adding momentum is the removal of noise that might occur during weight
updating. The weight thus converges smoothly (Gallant, 1994:221).
2.3.2.2.4 Number of hidden neurons
It is very common for the backpropagation algorithm used in the industry to only contain one
hidden layer, the main reason being that networks with more hidden layers learn very slowly.
Neural networks with one hidden layer are known to be universal approximators. The only way
to determine whether a network with multiple or a single hidden layer should be used, is by
trail-and-error (Gallant, 1994:221).
2.3.2.3 Machine learning
In order for an ANN to produce better results, an algorithm has to be written that gives the
ANN the ability to adjust its self. This is called machine learning (Nag, 2010). The two main
types of machine learning are supervised and unsupervised learning.
In supervised learning, the ANN is given input data to produce an output. The output the ANN
produces for the given input data is evaluated with the desired output. If the output from the
ANN does not match the desired output, the necessary adjustments are made with the use of

the learning algorithm (Gallant, 1994:6). The diagram in Figure 2.4 attempt to better describe
what is meant with supervised learning.
In contrast, unsupervised is not provided with the desired output. Instead, unsupervised
learning is used to adjust the ANN so that it can group data that show similar patterns (Gallant,
1994:7). Applications of unsupervised learning include finding the probability distribution of
data and identify groups of data that show similar properties and occur close together, i.e.
cluster identification (Bishop, 2008:10). In this study, supervised learning will be used, as the
experimental data from the hydrocyclone provide the ANN with input and the desired output.
2.3.2.4 Learning algorithms
There are a number of learning algorithms in existence that are used to adjust the neural
network in order to achieve the desired output. Some algorithms include the perceptron
learning algorithm, radial basis function algorithm and the Boltzmann learning algorithm (Jain,
1996). The question arises: “Which algorithms would be fit for a certain application?”
According to Jain (1996) the backpropagation algorithm, among others, are fit for use in control
systems. Gallant (1994:225), on the other hand reasons that trial-and-error has to be used to
find the appropriate algorithm. From his experience, he found that one should first try to use a
single-cell model before using a complex algorithm like the backpropagation algorithm.
A large problem that occurs in all systems is the presence of noise which commonly occurs in
real world applications. Noise is the introduction of erroneous data into the data set. It could
Figure 2.4: Supervised learning adapted from Hagan et al. (2002)

either be that the data is false or absent (Gallant, 1994:9). Artificial neural networks, on the
other hand are capable of handling noise (Gallant, 1994:10).
2.3.2.5 Problems with artificial neural networks
2.3.2.5.1 Failure to generalise
The purpose of training an ANN is not so much as to reproduce the exact values of the training
data, but rather to develop a network that is capable is producing a general answer that would
be expected in the training data range (Zhang et al., 2003; Bishop, 2008:332).
To explain the difference between good and bad generalization of ANNs, Bishop (2008:9-12)
uses the analogy with the complexity of an ANN and the order of a polynomial (polynomial of
high or low order). Given a certain data set generated by adding random values to the output
of a known function, say 𝑦 = sin⁡( 𝑥). Two polynomials are used to fit the data. The one
polynomial is of a high order and the other of a low order. The results of the first order
polynomial that were fit to the data could be seen on Figure 2.5. This corresponds to a neural
network with only one hidden node that produces a bad fit to the data. A possible solution is
to increase the number of free parameters. In the case of a neural network, the number of
hidden nodes will be increased. As can be seen in Figure 2.6, the higher order polynomial
produces a good fit for the all the data points. It is however a bad representation of the sine
wave, as there are plenty of oscillations (Bishop, 2008:9-12).
Figure 2.5: Polynomial of first order produces a bad fit for the data. Reproduced from Bishop
(2008:11)

To address this problem of finding a suitable complexity for the ANN, two concepts, the
variance and bias (not to be confused with bias weights) are used. The bias is a measure of
the amount with which the overall average of the ANN output differs with that of the given data.
Figure 2.5 has a high bias value while Figure 2.6 has a low bias value. The variance is used
as a measure of how well the ANN output will fit to another data set with that does not include
the ANN training data. A low variance value can be expected in Figure 2.5, while a high
variance value can be expected in Figure 2.6. The variance and bias goes hand in hand – an
increase in the variance leads to a decrease in the bias and vice-versa. The goal is to decrease
the value of both the variance and the bias (Bishop, 2008:334-335).
2.3.2.5.2 Regularization
Over-fitting is the result of weights with high values. In order to suppress the weights from
obtaining large values, regularization is applied. In regularization, the error of the output is
conditioned in order to produce a smoother output. This is done by adding a penalty term, Ω,
to the error, 𝐸. The conditioned error, 𝐸̃, can be calculated with the help of equation 2.7.
𝐸̃ = 𝐸 + 𝜐Ω 2.7
Bishop (2008:338) provides two ways in which the penalty term can be calculated. One of the
two methods is the Tikhonov regularizers which will not be discussed in this study. Another is
Figure 2.6: Polynomial of high order producing something that looks like a good fit for all the
data points, but the predictive power of the polynomial is sacrificed. Reproduced from
Bishop (2008:12).

the weight decay method. In this method, the penalty term is equal to the sum of squares of
all the weights and biases. The equation could be observed in equation 2.8.
Ω =
1
2
∑ 𝑤𝑖
2
𝑖
2.8
The weight decay regularizer suppresses the weights from obtaining large values which will
cause over-fitting (Bishop, 2008:338-339).
2.3.2.5.3 Structural stabilization
Trial-and-error could be used to find a more suitable structure that has little complexity, but
produces good results. One way of doing this, is by varying the number of hidden nodes or by
adding bias weights to the network (Gallant, 1994:221; Bishop, 2008:332).
2.3.2.5.4 More data points
The number of training data points and the possible curves that can fit through these training
data points are inversely proportional to each other (Zhang et al., 2003). If one desires to train
a complex network for reasons such as more accurate results, one simply has to add more
training data to the network.
A neural network that has the ability to generalize should give output as displayed by the lower
order polynomial in Figure 2.7.
Figure 2.7: A lower order polynomial that has the capability to generalize well

2.3.2.6 The backpropagation algorithm
As mentioned above, the backpropagation algorithm is one of the popular neural network
training algorithms that is suitable for use in process control environments. It was decided that
this training algorithm will be used for the neural network in this study. It should be noted that
this algorithm mentioned here is developed for a neural network that has a single hidden layer.
The sources used in the development of this artificial neural network include Jain (1996) and
Basheer and Hajmeer (2000). The steps are as follow:
1. Choose the amount of input, hidden and output nodes. This will then also tell you how
many weights there will be in the neural network architecture.
2. Assign random values to the weights.
3. Propagate the signal forward by multiplying the inputs to the neural network with the
with the weights that connect the inputs to the hidden neurons, then sum the results of
the weights that goes to each of the hidden nodes to produce the signal that is sent to
the specified node as can be seen in equation 2.9.
𝜉𝑗 = ∑ 𝑥𝑖 𝑤𝑖𝑗
𝑛
𝑖=0
2.9
Where:
𝜉𝑗=Signal sent to node 𝑗.
𝑛=The amount of weights attached to the node 𝑗.
𝑥𝑖=Input value from weight 𝑖
𝑤𝑖𝑗=Weight attached to the input node, 𝑖, and the hidden node 𝑗.
4. Substitute the input signal into the activation function. The sigmoid activation function
was chosen and can be seen in equation 2.10.
𝜎𝑗
𝐻
=
1
1 + 𝑒−𝜉 𝑗
2.10
Where 𝜎𝑗
𝐻
is the output value of the transfer function of node 𝑗 in the hidden layer.
5. The output of node 𝑗, 𝜎𝑗, is then fed forward to the next layer of nodes, the output nodes
where equation 2.11 is applied.

𝜉𝑗 = ∑ 𝜎𝑖
𝐻
𝑤𝑖𝑗
𝑛
𝑖=0
2.11
6. The signal to the output nodes, 𝜉𝑗, is again substituted into the sigmoid function in
equation 2.12 to produce the output of the output layer nodes.
𝜎𝑗
𝑂
=
1
1 + 𝑒−𝜉 𝑗
2.12
Where 𝜎𝑗
𝑂
is the output of the output layer nodes. Note that 𝜎𝑗
𝑂
is equal to 𝑥𝑗
𝑂
that will be
mentioned soon.
7. The error of the output neurons could then be calculated by comparing the output of
the output neurons with the desired output of the training data with the use of equation
2.13 (Gupta & Lam, 1998).
𝛿𝑗
𝑂
= (𝑥𝑗
𝑂
− 𝑦𝑗)𝑥𝑗
𝑂
(1 − 𝑥𝑗
𝑂
) 2.13
Where:
𝛿𝑗
𝑂
=Error of the output node 𝑗
𝑥𝑗
𝑂
=Output of the output node 𝑗. Again note that 𝑥𝑗
𝑂
is equal to 𝜎𝑗
𝑂
.
𝑦𝑗=Desired output of the output node 𝑗
8. The values with which the weights between the output layer nodes and the hidden
layer nodes are changed could now be calculated with equation 2.14 (Gupta & Lam,
1998).
∆𝑤𝑖𝑗
𝑂
= 𝜂𝛿𝑗
𝑂
𝑥𝑗
𝑂
− 𝑀𝑤𝑖𝑗
𝑂
(𝑡 − 1) − 𝜂𝑅 × (
𝑤𝑖𝑗
𝑂
(𝑡 − 1)2
((1 + 𝑤𝑖𝑗
𝑂(𝑡 − 1))
2
)
2)
2.14
Where:
∆𝑤𝑖𝑗
𝑂
=The value with which weight 𝑤𝑖𝑗
0
has to be updated
𝜂=Training speed defined by the user
𝛿𝑗
𝑂
=Error of the output node 𝑗
𝑥𝑗
𝑂
=Output of the output node 𝑗
𝑀=Momentum factor defined by the user

𝑅=Regularization factor defined by the user
𝑤𝑖𝑗
0
(𝑡 − 1)=The value of the previous weight 𝑤𝑖𝑗
𝑂
9. The new weight values can then be calculated with equation 2.15.
𝑤𝑖𝑗
𝑂
=⁡ 𝑤𝑖𝑗
0
(𝑡 − 1) − ∆𝑤𝑖𝑗
𝑂 2.15
Where 𝑤𝑖𝑗
𝑂
is the new value of the output weight that extends from node 𝑖 in the hidden
layer to node 𝑗 in the output layer.
10. The next step is to calculate the error of the hidden nodes with the help of equation
2.16 (Gupta & Lam, 1998).
𝛿𝑗
𝐻
= 𝑥𝑗
𝐻
× (1 − 𝑥𝑗
𝐻
) × 𝑤𝑖𝑗
𝑂
(𝑡 − 1) × 𝛿𝑗
𝑂 2.16
Where
𝛿𝑗
𝐻
=Error of the hidden node 𝑗
𝑥𝑗
𝐻
=Output of the hidden node 𝑗
11. Now that the error of the hidden layer of nodes are known, the increment with which
the weights that extend from the input layer to the hidden layer has to change could
now be calculated with equation 2.17 (Gupta & Lam, 1998).
∆𝑤𝑖𝑗
𝐻
= 𝜂𝛿𝑗
𝐻
𝑥𝑗
𝐻
− 𝑀𝑤𝑖𝑗
𝐻
(𝑡 − 1) − 𝜂𝑅 × (
𝑤𝑖𝑗
𝐻
(𝑡 − 1)2
((1 + 𝑤𝑖𝑗
𝐻(𝑡 − 1))
2
)
2)
2.17
Where:
∆𝑤𝑖𝑗
𝐻
=The value with which weight 𝑤𝑖𝑗
𝐻
has to be updated
𝛿𝑗
𝐻
=Error of the hidden node 𝑗
𝑥𝑗
𝐻
=Output of the hidden node 𝑗
𝑤𝑖𝑗
𝐻
(𝑡 − 1)=The value of the previous weight 𝑤𝑖𝑗
𝐻
12. The new weights can then be calculated with the help of equation 2.18.
𝑤𝑖𝑗
𝐻
=⁡ 𝑤𝑖𝑗
𝐻
(𝑡 − 1) − ∆𝑤𝑖𝑗
𝐻 2.18
The above steps could be repeated with the data from a new sample. An epoch is completed
if the artificial neural network has gone through the entire set of training data. A new epoch is
started by again going through the training data set.

Experimental procedure| 20
Chapter 3 - Experimental procedure
3.1 Overview
Various slurries were prepared to be fed to the hydrocyclone. Different operating conditions
were imposed on the hydrocyclone. All necessary operating conditions, samples and other
data, were recorded on each run. A PSD analysis of the samples were carried out on the
samples. The gathered information could then be used to determine the d50c and the
sharpness of separation.
3.2 Raw materials
The solid particles that had to be separated was micron sized silica quartz particles, MQ15,
supplied by Micronized SA Limited. According to Tew (2012) the particles contain 98.50%
silica, with small amounts of Al2O3, K2O, Fe2O3, CaO and Na2O. The particles have a density
of 2650
𝑘𝑔
𝑚3 and a d50c and m values of 20 𝑚𝑖𝑐𝑟𝑜𝑛𝑠 and 1.9 respectively.
The carrier fluid used in this case was municipal water from the Tlokwe municipality.
3.3 Equipment
 3X 5 litre buckets
 2X 20 litre buckets
 1X water gun
 1X Marcy scale
 1X Doppler flow meter
 50X poly tops
 1X large syringe
 1X spoon
3.4 Experimental setup
A diagram showing the experimental setup could be observed on Figure 3.1. The geometry of
the hydrocyclone that was used in this study is displayed on Table 3.1. The alphabetical
numbering in Figure 3.1 are explained below:
 A: Slurry storage tank
 B: Circulation pump
 C: Main feed bypass valve
 D: Feed fine tune bypass valve
 E: Feed shutdown valve

 F: Pressure gauge
 G: Doppler flow meter
 H: Hydrocyclone
 I: Hydrocyclone overflow
 J: Hydrocyclone underflow
 K: Sample taken from the hydrocyclone overflow
 L: Sample taken from the hydrocyclone underflow
 M: Mixer
Figure 3.1: Diagram of the hydrocyclone setup

The mixer, mentioned above, consists of square tubing that transports the fluid from the
fine tuning bypass valve to the bottom of the storage tank. Holes were made at the end of
the square tubing in order for the slurry to be sprayed towards the sides of the storage
tank so as to promote better mixing.
Two sampling containers are located above the storage tank and below the overflow and
underflow outlets. As soon as the container of the underflow is pushed in under the
underflow, a mechanism pushes the overflow pipe in the overflow container, meaning that
the overflow and the underflow are sampled simultaneously. The experimental
hydrocyclone setup can be seen on Figure 3.2. The two containers that store the underflow
and the overflow are also indicated on this figure.
Figure 3.2: The experimental hydrocyclone setup
Where:
 A: Hydrocyclone
 B: Hydrocyclone overflow
 C: Hydrocyclone underflow

 D: Sampling container for the underflow
 E: Sampling container for the overflow
 F: Slurry storage tank
Table 3.1: Hydrocyclone geometry
Part Size
𝐷𝑐 10 𝑐𝑚
𝐷𝑖 3.03 𝑐𝑚
𝐷 𝑜 3.4 𝑐𝑚
ℎ 53 𝑐𝑚
3.5 Experimental procedure
3.5.1 Preparation
3.5.1.1 Doppler flow meter calibration
The Doppler flow meter is installed on a suitable place where minimum noise will occur due to
turbulence in the piping. The storage tank was initially loaded with water only. After the pump
was turned on, one person read the value from the Doppler flow meter display, while the other
person fills the underflow and overflow containers with the water coming from the
hydrocyclone. The underflow and the overflow of the hydrocyclone is equal to the feed to the
hydrocyclone. The person filling the underflow and overflow buckets also has to keep track of
the time in which the containers are filled. From the volume of the water collected and the time
in which the water was collected, one can then calculate the real feed flowrate to the
hydrocyclone. The Doppler flow meter was calibrated accordingly. This procedure was
repeated until the error between the Doppler flow meter and that measured with the container
and watch method was small enough.

3.5.1.2 Marcy scale calibration
The Marcy scale is a handy tool that could be used to determine the density of a slurry mixture.
A picture of the Marcy scale could be observed on Figure 3.3. Before its use, it has to be
calibrated with the above mentioned municipal water. The density value is set to 1000
𝑘𝑔
𝑚3 when
calibrated.
3.5.1.3 Slurry preparation
One of the variables that also has to be monitored is the volumetric percentage of solids in the
feed. The slurry tank (storage tank) was firstly filled with 200 𝑙𝑖𝑡𝑒𝑟𝑠 of municipal water. The
weight of silica sand that has to be added to the tank to obtain a certain volumetric solids
percentage is calculated with equation 3.1.
𝑚 𝑠 =
𝜑 × 𝑉𝑤
1
𝜌𝑠
− 𝜑 ×
1
𝜌𝑠
3.1
Where:
𝑚 𝑠=Mass of silica sand that has to be added to the storage tank
𝜑=Desired volume percentage of solids in the slurry
Figure 3.3: The Marcy scale

𝜌𝑠=Density of silica sand = 2650
𝑘𝑔
𝑚3
𝑉𝑤=Volume of water in the tank = 200 𝑙𝑖𝑡𝑒𝑟 = 0.2 𝑚3
3.5.2 Sampling
Step-by-step instructions for obtaining samples from the rig5
are given in this section. These
steps could only be followed after the preparation mentioned in chapter 3.5.1 have been
completed:
1. Make sure that the valve between the pump opening and the storage tank exit is fully
opened;
2. Make sure that there are no objects in the storage tank that could cause pump failure.
3. Close the feed shutdown valve;
4. Close both of the valves from the overflow and underflow containers;
5. Fully open both the feed bypass valves;
6. Turn on the pump;
7. The slurry from the bypass valves will lead to plenty of turbulence in the storage tank.
It is however recommended that the storage tank also be mixed manually so as to
ensure that most of the silica particles are suspended in the slurry;
8. Fully open the feed shutdown valve;
9. Slowly close the feed bypass valves while keeping an eye on the pressure gauge. Stop
closing the bypass valves as soon as the required pressure is reached;
10. One person has to take note of the flow rate, while the other person has to push in the
underflow sampling container. This has to be done at the same time. The person
recording the flow rate from the Doppler flow meter also has to start and stop a
stopwatch when the containers are firstly inserted and pulled out again;
11. As soon as the underflow and overflow containers have been pulled out, the pump
may be stopped;
12. Separate buckets have to be inserted under the hoses that are connected to the outlet
valves of the underflow container and the overflow container;
13. Slowly open the outlet valves of the overflow and underflow containers and collect the
underflow and overflow samples in the separate buckets. The content of the underflow
and overflow containers have to be stirred well while the outlet valves are opened so
as to avoid silica sand from settling and remaining in the underflow or overflow
containers;
5 The terms “rig” and “hydrocyclone experimental setup” are used interchangeably in this study.

14. The buckets have to be weighed separately on scale. The scale should previously
have been reset with the mass of the buckets that are used. Similar buckets thus have
to be used. The mass of the content inside the buckets are recorded;
15. A smaller sample of overflow and the underflow are taken by mixing the slurry in the
buckets and filling a poly top with the content. The poly tops should be labelled
thoroughly.
16. The remaining content in the buckets are again stirred before the Marcy scale bucket
is filled with the slurry. The Marcy scale bucket is put on the Marcy scale to determine
the density of the slurry. The densities of both the slurries have to be determined this
way.
The buckets containing the remaining slurry are emptied into the slurry storage tank of the rig.
The above steps are then repeated for the next sample.
3.5.3 Analysing
The particle size distribution of the underflow samples are all determined with the use of the
Malvern Mastersizer 2000. The particles are circulated through the Mastersizer where they
eventually pass through a laser beam. The particles passing through the laser beam scatter
some of the radiation from the laser beam. The intensity of the backscattering of the laser light
from the particles are measured with special backscatter detectors. The angle at which the
light is scattered is inversely proportional to the size of the silica particles (Malvern
instruments, 2005). Figure 3.4 is a picture of the Malvern Mastersizer 2000.
Figure 3.4: Malvern Mastersizer 2000

3.5.4 Experimental error
For the experimental error determination, 4 random operating conditions were chosen from all
the experiments that were conducted. Six runs were completed on each of these operating
conditions. A total of 26 experiments were thus completed in order to determine the
experimental error. The conditions at which each of the sets were done, as well as the results
could be observed in Appendix D. All the calculations that were done in the determination of
the experimental error could be found in the electronically attached spreadsheet named
“Experimental Error”.
It is assumed that the data follows a normal distribution. Due to the small amount of available
data for each of the sets, the experimental error had to be determined using the student’s t
test (Devore & Farnum, 2005:313-318).
The experimental error could be determined with equation 3.2.
𝑡 𝑛−1(
𝛼
2
) ×
𝑆𝑡
√ 𝑛𝑢
3.2
Where:
𝑡 𝑛−1(
𝛼
2
)=Critical t value that could be obtained from the back cover of Devore and Farnum
(2005)
𝑆𝑡=Standard deviation of the data
𝑛𝑢=Number of data points available in the set
A 95% confidence interval was used to obtain the experimental error. The processed
experimental data that were used for the determination of the experimental error could be
observed in Table 3.2.
Table 3.2: Processed data used for the experimental error determination
Data Set 1 Set 2 Set 3 Set 4
d50c m d50c m d50c m d50c m
1 21.71 1.35 28.41 2.03 17.72 1.13 29.58 1.44
2 16.59 1.37 24.51 1.92 21.91 1.21 30.83 1.66
3 18.63 1.49 29.57 1.89 22.02 1.21 30.99 1.69

4 - - - - 22.26 1.30 31.44 1.71
5 - - - - 23.41 1.34 31.66 2.00
Valid 18.44 1.29 26.03 1.97 21.82 1.16 29.74 1.60
Two of the data points in both sets 1 and 2 have been discarded due to their large deviation
with the rest of the data in the set. One data point in each set have been used as a validation
data point. The values that were substituted into equation 3.2 to calculate the experimental
error of each set could be observed on Table 3.3. The results of the d50c experimental error
for each data set and the results of the sharpness of separation error for each data set could
be observed on Figure 3.5 and Figure 3.6 respectively.
Table 3.3: Values for substitution into the student's t equation
Data Set 1 Set 2 Set 3 Set 4
d50c m d50c m d50c m d50c m
n 3 3 3 3 5 5 5 5
S 2.58 0.076 2.65 0.075 2.17 0.073 0.811 0.20
X6
18.98 1.40 27.50 1.95 21.46 1.24 30.90 1.70
T(95%) 4.303 4.303 4.303 4.303 2.776 2.776 2.776 2.776
6 X is the average of the data in the specific set

Figure 3.5: Experimental error of the d50c with a 95% confidence interval
Figure 3.6: Experimental error of the sharpness of separation with a 95% confidence interval
Large errors are observed for the d50c values in each of the sets in Figure 3.5. This could be
ascribed to the varying feed PSDs that will be dealt with later in this paper. The experimental
errors of the sharpness of separation as seen on Figure 3.6 are however acceptable.
5
10
15
20
25
30
35
40
45
0 1 2 3 4 5
d_50c
Set Number
D50c Average
Validation Data
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5
m
Set Number
m Average
Validation Data

Model development| 30
Chapter 4 - Model development
4.1 Overview
121 samples where processed to be put through the artificial neural network and the Plitt
model. Unfortunately, the raw data needed to be processed before it was fit to use in the
artificial neural network and the Plitt model. For more information on how the data was
processed, please refer to Appendix A.
4.2 The Plitt model
As mentioned before, the modified Plitt model with the fudging factors will be used in an
attempt to predict the d50c and the sharpness of separation of the hydrocyclone operated under
certain conditions. Of the 121 data points, 69 samples were used to fit the fudging factors with
the help of the Excel® add-in, Solver. The input parameters from the experimental data that
were not used in the tuning of the fudging factors were then substituted into the Plitt model.
The d50c and sharpness of separation results from Plitt model were then compared to the
corresponding experimental results. The Plitt model calculations could be found in the
electronically attached spreadsheet named “Plitt model”
4.2.1 Split flow
As mentioned before, this paper will only focus on predicting the d50c and the sharpness of
separation, m. The processed input data from Appendix C was inserted into the d50c and
sharpness of separation equations of the Plitt model.
For the split flow variable, 𝑆, of the d50c equation either the experimentally calculated 𝑆 or the
split flow calculated with one of the Plitt model equations given in equation 4.1 could be used.
4.1
The value of 𝐹4 was determined by minimizing the error between 69 of the experimental and
calculated split flow values with the help of the Excel® add-in, Solver. The resulting value of
𝐹4 was found to be 0.13. The remaining experimental values were then compared with
corresponding Plitt model values under the same operating conditions. The results of this
investigation are presented in Figure 4.1. Very small deviations from the experimental split
flow values are observed, meaning the split flow values from the Plitt model is suitable for
further use.

4.2.2 Cut size – d50c
The d50c value was calculated with equation 2.2. Just as with the split flow, 69 experimental
data points were used to adjust the value of 𝐹1. There are however another variable, 𝑘, that
could be adjusted in this equation. It was observed that solver could either vary 𝐹1 or 𝑘 to
obtain a minimum error. A value of 0.5 was arbitrarily chosen for 𝑘, while 𝐹1 was varied. The
resulting value for 𝐹1 is 64.9.
4.2.3 Sharpness of separation
The fudging factor of the sharpness of separation was determined the same way as the above
mentioned fudging factors.
4.3 The artificial neural network
The backpropagation algorithm will be used to train the neural network. A few modifications
were made to the ANN. This includes the addition of a regularization term and the addition of
a momentum term. Both these terms could be observed in equation 2.14 and 2.17. All artificial
neural networks that were constructed had various input variables and only one output
variable. The output variable was either the d50c or the sharpness of separation.
4.3.1 Artificial neural network architecture
Six different artificial neural networks have been written. The amount of neurons in each of
these networks could be varied between 1 and 20, while the input and output neurons cannot
be changed. Table 4.1 displays a list of all the ANNs that have been programmed. All these
programs could be found under the attached folder named “Artificial neural networks”.
Figure 4.1: Experimental split flow values plotted with the predicted Plitt model split flow values
0
0.5
1
1.5
2
2.5
0 10 20 30 40 50 60
Splitflow
Sample number
S experimental
S predicted

Table 4.1: Different artificial neural networks that were programmed
Neural network number Input variables Output variables
1 Du d50c
2 Du, 𝜑 and Q d50c
3 Du, 𝜑, Q, P, and S d50c
4 Du m
5 Du, 𝜑 and Q m
6 Du, 𝜑, Q, P, and S m
Separate neural networks for the d50c and sharpness of separation were constructed, as neural
networks that had both these variables as output, lacked the ability to learn.
It was decided that the first ANN of both the d50c output and sharpness of separation output
should only have the spigot diameter as input variable as it is known that this variable has the
largest effect on the hydrocyclone performance. In this study, the spigot diameter was
changed by switching off the pump and manually inserting a new spigot with a different
diameter. In industry, this would however be impractical. In a study conducted by Eren and
Gupta (1988), the spigot size could be adjusted pneumatically while the cyclone was on-line.
This study will thus be applicable to hydrocyclones which spigot size could be changed while
the cyclone is on-line.
The second set of neural networks contained the same inputs that are needed in the Plitt
model – the volumetric percentage solids in the feed 𝜑 and the feed volumetric flowrate 𝑄.
This neural network and the Plitt model are thus on equal grounds and could be compared
with one another.
For the third and last set of neural networks, the split flow and pressure drop over the cyclone
were added as inputs to test whether the predictive power of the neural network will improve.
Each neural network that were constructed had the ability to test 20 different architectures with
one click of a button. The networks could thus be run on multiple computers at the same time.
More neural networks could thus be tested in a shorter amount of time in comparison with
MATLAB®’s Neural Network Toolbox™.

Each of the neural networks were trained with roughly 75% of the experimental data. The
remaining 25% of the data was used as validation data. The validation data was used for all
the results that are displayed in chapter 5. None of the training data were thus used for
validation purposes.
To display the learning capability of the developed neural networks, a neural network that had
the spigot diameter as input parameter and the sharpness of separation as output parameter
was trained with 80 epochs and a training speed of 0.02. The results are displayed on Figure
4.2. The reader is referred to Appendix E for the source code of one of the artificial neural
networks.
Figure 4.2: Learning capability of one of the 6 developed artificial neural networks

Results and discussion| 34
Chapter 5 - Results and discussion
After processing the data, it was found that the PSD of the feed varied considerably. An
alternative for calculating the feed PSD is dealt with in this section.
As mentioned before, the neural network was written in order to make it convenient for the
user to test multiple neural network architectures at once. This functionality was used to filter
out the more suitable neural network architectures for predicting the cut size and the
sharpness of separation. These filtered out neural networks were then further optimised. The
results as well as a discussion of these results are given in this section.
5.1 Deviations in the feed PSD
From the start of the sampling and analyses, it was assumed that the feed PSD remained
constant for all the slurry batches, as the same silica sand product from the same
manufacturer was used each time. It thus only seemed necessary to sample and determine
the PSD of the feed once and sample the underflow of each run, instead of sampling both the
underflow and the overflow of each run. This meant the total amount of PSD analyses could
be cut in half. The resulting partition curves that were produced had partition values that
exceeded 1 or was lower than 0. This means that the material balance did not solve. After
Figure 5.1: Particle size distribution of 25 different feed samples
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0 20 40 60 80 100 120 140 160
Volume%solids
Particle size [microns]

taking samples of 25 different slurry mixtures7
, it was found that the PSDs differed significantly
from each other as can be seen on Figure 5.1.
This meant that the partition curve could no longer be calculated from one feed PDS sample.
A solution to this problem was to calculate 25 different partition curves from the feed PSDs
that could be seen on Figure 5.1 for each of the underflow samples that were analysed.
One out of the 25 partition curves had to be chosen. The chosen partition curve had to fulfil
two criteria. Firstly, there may not be a value on the partition curve that exceeds 1. This would
mean that more of a certain size of particles exits the cyclone than have entered the cyclone.
According to the literature study, the correction made to the partition curve in order to obtain
the corrected partition curve is equal to the recovery of water to the underflow. The partition
curve thus also has to intersect the y-axis at a value that is close to the value of Rf. This is the
second constraint the partition curve has to meet.
Unfortunately, the Mastersizer was incapable of accurately measuring the particle sizes that
were smaller than 8.4 𝜇𝑚. The curve on Figure 5.2 shows the large fluctuations that occur at
particle sizes smaller than 8.4 𝜇𝑚. This phenomenon occurred in all the partition curves.
According to the results from the Mastersizer, the particles under 8.4 𝜇𝑚 amounted to 0.1%
of the total particles. The values of these particles will thus be neglected. The partition curve
value of the 8.4 𝜇𝑚 will thus be taken as the recovery of liquid to the underflow.
7 The word batch and slurry mixture are used interchangeably
Figure 5.2: Example partition curve before justifications

A similar phenomenon was observed for particles larger than 95 𝜇𝑚. These particles
amounted to less than 0.05% of the total particles. It would thus also be a safe assumption to
ignore these particle sizes in further calculations.
After the partition curve that suited the description above was chosen, small changes were
made to the value of Rf so that it would be equal to the experimental Rf value. These small
changes could be observed on Figure 5.3.
5.2 Plitt model
The d50c results of the modified Plitt model are displayed on Figure 5.4. The blue line connects
the experimental data points, while the orange line connects points that were predicted by the
Plitt model. The results are displayed in another form on Figure 5.5: where the predicted vs.
actual values are plotted over the 𝑦 = 𝑥 curve. To determine how well the data fits the 𝑦 = 𝑥
curve, a value called the coefficient of determination is calculated. This resulted in a 𝑅2
value
of 0.664.
Figure 5.3: Experimental vs. adjusted values of Rf

Figure 5.5: Plitt model predicted cut size vs. experimental cut size plotted over the y=x
curve
Figure 5.4: Experimental cut point plotted with the cut point predicted by the Plitt model
0
5
10
15
20
25
30
0 10 20 30 40 50 60
d50c[microns]
Sample number
d50c experimental
d50c predicted

From Figure 5.4, it is clear that the Plitt model was capable of predicting the cut point to a
certain extent. At the larger d50c values, the Plitt model tends to overpredict the d50c, while the
opposite is true for the smaller d50c values. The absolute error for the 44 validation data
points are 71.5 𝜇𝑚. The average error per predicted d50c value is thus 1.6 𝜇𝑚 which is
acceptable.
The sharpness of separation results from the Plitt model could be observed on Figure 5.6.
Again, the experimental values for the sharpness of separation are connected by the blue line,
while the predicted values are connected by the orange line.
Figure 5.6: Experimental sharpness of separation plotted with the sharpness of separation
predicted by the Plitt model
0
1
2
3
4
5
6
0 10 20 30 40 50 60
Sharpnessofseparation
Sample number
m experimental
m predicted

Figure 5.7: Plitt model predicted m vs. experimental m plotted over the y=x curve
From Figure 5.6 and Figure 5.7, it is clear that the Plitt model is incapable of predicting the
sharpness of separation. Deviations with an absolute value of 2 could easily be observed on
these figures.
5.3 Artificial neural networks
Various experiments were conducted in order to determine which neural network architecture
and parameters will be more suited for predicting the d50c and the sharpness of separation.
The same architectures and parameters were tested on both the d50c and the sharpness of
separation.
In the first series of tests, the number of epochs and the training speed was held constant
while the number of neurons in the hidden layer was varied between 3 and 20. Below 3 hidden
neurons, the neural network lacked the complexity to adequately predict the d50c and the

sharpness of separation. All six neural networks mentioned in Table 4.1 were tested with these
architectures and parameters.
The architecture and parameters that were revealed to be the best out of those tested,
underwent further testing by increasing the amount of epochs by orders of magnitude and
decreasing the training speed so as to increase the chances of finding the global minimum.
The momentum and regularization terms were tested with the same architecture and
parameters as those used by neural network mentioned in the previous paragraph.
5.3.1.1 Neural network screening
The results of training the neural network with only the spigot diameter, Du, as input are given
in Figure 5.8Error! Reference source not found.. Overtraining8
occurred at all of the tests
accept for the neural network that had 3 hidden neurons. It should be noted that the neural
networks stop training as soon as overtraining started.
The results in Figure 5.8 show that a simple neural network with no more than 6 hidden
neurons had the best prediction capabilities. Adding more hidden layers tends to
overcomplicate the network, leading to poorer results.
Figure 5.8: Results of neural network 2 trained with a training speed of 0.2 and a
maximum amount of epochs of 8000
8 Overtraining takes place when the artificial neural network stops to converge to an answer and starts
to diverge
50
55
60
65
70
75
80
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Combinedabsoluteerrorof44results
[microns]
Number of hidden neurons

By adding more input parameters to the neural network, even better results are achieved. The
results could be seen on Figure 5.9. All networks in this test were trained until overtraining
commenced. A maximum error just above 1.35 micron per validation data point was achieved
in this neural network.
Two more input parameters, the split flow and the pressure drop over the cyclone were
inserted. The addition of these two parameters produced better results than the previous tests.
No trend could be observed in the absolute error as the amount of neurons were increased.
The results are given on Figure 5.10.
52
53
54
55
56
57
58
59
60
61
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[microns]
44.4
44.6
44.8
45
45.2
45.4
45.6
45.8
46
46.2
46.4
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[microns]

5.3.1.2 Enhancing the neural network
From the results in Figure 5.8, Figure 5.9 and Figure 5.10, it is clear that the predictive power
of the neural network increases with an increase in the number of inputs. It was thus decided
to further develop neural network 3 for predicting the d50c of the hydrocyclone.
The neural network was given 12 hidden neurons and firstly trained with a maximum of 60000
epochs and a training speed of 0.02. It was expected that the absolute error observed in Figure
5.10 would decrease, instead, the error increased with almost 1.6 microns to 47.19 𝜇𝑚. An
explanation for this phenomenon could be that this neural network just happened to step over
the local minimum that was found by the neural network in Figure 5.10. Another representation
of the results is given in Figure 5.12.
Although there are some neural network output values that differ with 2 𝜇𝑚, from the
experimental d50c values, Figure 5.11 shows that the neural network has adequate prediction
power.
Figure 5.11: Calculated d50c plotted with the experimental d50c values of neural network 3
trained with a maximum of 60000 epochs and a training speed of 0.02
13
15
17
19
21
23
25
27
0 10 20 30 40 50
d_50c
Validation sample number
Calculated d_50c
Experimental d_50c

Figure 5.12: Predicted d50c vs. experimental d50c plotted over the y=x curve for the neural
network trained with a maximum of 60000 epochs and a training speed of 0.02
For the next enhancement, the momentum term will be used. The momentum constant was
given a value of 1 × 10−6
. The other parameters and architecture of the neural network
remains unchanged. A significant reduction of more than 3 𝜇𝑚 was observed in the combined
error when compared to the previous test. The value of the combined error in this case is
44.15 𝜇𝑚. It can also be seen on Figure 5.14 that the 𝑅2
value decreased by 0.05 to 0.795.
When looking at the calculated and experimental graph on Figure 5.13, certain improvements
could be spotted. As an example, the last validation data point lies on the predicted d50c value.
This was not the case in Figure 5.11.

Figure 5.13: Calculated vs. experimental values of neural network 3 trained with a maximum of
60000 epochs and a training speed of 0.02 with the addition of the momentum term
Figure 5.14: Predicted d50c vs. experimental d50c plotted over the y=x curve for a neural
network trained with a maximum of 60000 epochs and a training speed of 0.02
13
15
17
19
21
23
25
27
0 10 20 30 40 50
d_50c
Calculated d_50c
Experimental d_50c

For the final neural network enhancement, the momentum term will be deactivated while the
regularization term will the activated. The regularization constant was set to a value of 1 ×
10−4
. The regularization term only produced slight improvements when compared to the
initial neural network enhancement. The resulting combined absolute error was 46.45 𝜇𝑚.
The validation results are displayed on Figure 5.15. The predicted vs. experimental d50c
could be observed on Figure 5.16. Only slight differences are observed in the graphs of
Figure 5.13 and Figure 5.15.
of 60000 epochs and a training speed of 0.02 with the addition of the regularization term
13
15
17
19
21
23
25
27
0 10 20 30 40 50
d_50c
Calculated d_50c
Experimental d_50c

Figure 5.16: Predicted d50c vs. experimental d50c plotted on the y=x curve for the neural
network trained with a maximum of 60000 epochs and a training speed of 0.02 with the
addition of the regularization term
5.3.2.1 Screening of neural networks
Screening of the neural networks with the sharpness of separation as output was done the
same way as the screening of the d50c neural networks. The results of neural networks 4, 5
and 6 are given in Figure 5.17, Figure 5.18 and Figure 5.19 respectively. Most of the networks
were trained until overtraining commenced.

9
9.2
9.4
9.6
9.8
10
10.2
10.4
10.6
10.8
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[microns]
8
8.5
9
9.5
10
10.5
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[microns]

The same phenomena that happened in neural networks 1, 2 and 3 were observed in neural
networks 4, 5 and 6. When the amount of inputs to the neural network was less than or equal
to 3, the predictive capability of the neural networks reached their peak when the amount of
hidden neurons were capped at 8. There again was no trend in the prediction power of the
neural network as the amount of hidden neurons were increased for the neural network that
had 5 inputs. An increase in the amount of inputs to the neural network also lead to an
improved predicting capability. It was thus decided that neural network 6 should be further
developed.
5.3.2.2 Enhancing the neural network
It was decided that neural network 6 should be given 13 hidden nodes, as good results were
obtained with this amount of hidden nodes as can be seen on Figure 5.19. The network was
trained with a maximum of 60000 epochs and a training speed of 0.02. After the first test, the
momentum term was added with a momentum constant of 1 × 105
and in the second test, the
momentum term was deactivated and the regularization term was inserted with a
regularization constant of 0.001. The results could be observed on Figure 5.20, Figure 5.22
and Figure 5.24. Predicted vs. calculated plots could be seen on Figure 5.21, Figure 5.23 and
Figure 5.25.
9.06
9.07
9.08
9.09
9.1
9.11
9.12
9.13
9.14
9.15
9.16
9.17
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[microns]

Figure 5.20: Experimental and predicted values of neural network 6 trained with a
maximum of 60000 epochs and a training speed of 0.02
Figure 5.21: Predicted vs. experimental m plotted over the y=x graph for neural network 6
trained with a maximum of 60000 epochs and a training speed of 0.02
2
2.5
3
3.5
4
4.5
0 5 10 15 20 25 30 35
m
Experimental m
Predicted m

Figure 5.22: Predicted and experimental values of neural network 6 trained with a
maximum of 60000 epochs and a training speed of 0.02 with the addition of the
momentum term
Figure 5.23: Predicted m vs. experimental values m plotted over the y=x line for neural
network 6 trained with a maximum of 60000 epochs and a training speed of 0.02 with the
addition of the momentum term
2
2.5
3
3.5
4
4.5
0 5 10 15 20 25 30 35
m
Experimental m
Predicted m

Figure 5.24: Predicted vs. experimental values of neural network 6 trained with a
maximum of 60000 epochs and a training speed of 0.02 with the addition of the
regularization term
Figure 5.25: Predicted m vs. experimental m plotted over the y=x curve for neural network
6 trained with a maximum of 60000 epochs and a training speed of 0.02 with the addition
of the regularization term
2
2.5
3
3.5
4
4.5
0 5 10 15 20 25 30 35
m
Experimental m
Predicted m

Accept for the large outliers observed near validation sample number 20 and at validation
sample 13, the sharpness of separation was predicted with reasonable accuracy. When
comparing the graphs, Figure 5.20, Figure 5.22 and Figure 5.24, one observes that there are
no significant differences. The combined absolute error of the neural network that had no
regularization nor momentum term had a combined validation error of 9.11 for the sharpness
of separation, meaning that the sharpness of separation was out with an average value of
0.21 per validation sample. The momentum term further decreased the combined error to
9.04, while the addition of the regularization term significantly decreased the combined error
to 8.87, meaning that the average error per validation sample prediction was decreased to
0.2.

Comparison between the Plitt model and an artificial neural network in predicting hydrocyclone performance

Comparison between the Plitt model and an artificial neural network in predicting hydrocyclone performance

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (12)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Comparison between the Plitt model and an artificial neural network in predicting hydrocyclone performance

Ähnlich wie Comparison between the Plitt model and an artificial neural network in predicting hydrocyclone performance (20)

Comparison between the Plitt model and an artificial neural network in predicting hydrocyclone performance