SlideShare a Scribd company logo
1 of 10
Download to read offline
International Journal of Computer and Technology (IJCET), ISSN 0976 – 6367(Print),
International Journal of Computer Engineering Engineering
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME
and Technology (IJCET), ISSN 0976 – 6367(Print)                           IJCET
ISSN 0976 – 6375(Online) Volume 1
Number 1, May - June (2010), pp. 82-91                                 ©IAEME
© IAEME, http://www.iaeme.com/ijcet.html

      REGRESSION, THEIL’S AND MLP FORECASTING
                        MODELS OF STOCK INDEX
                                      K. V. Sujatha
                                     Research Scholar
                             Sathyabama University, Chennai
                            E-mail: sujathacenthil@gmail.com

                               S. Meenakshi Sundaram
                              Department of Mathematics
                            Sathyabama University, Chennai
                          E-mail: sundarambhu@rediffmail.com


ABSTRACT
       Financial Forecasting or specifically Stock Market prediction is one of the hottest
fields of research lately due to its commercial applications owing to the high stakes and
the kinds of attractive benefits that it has to offer. Financial time-series is one of the
‘noisiest’ and ‘non-stationary’ signals present and hence very difficult to forecast. In this
paper we have made an attempt to forecast the daily prices of stock index using a
Regression, Theil’s and MLP models and the predictive ability of these models are
compared using standard error measures.
Keywords: Forecasting, Regression, Principal Component, Perceptron, MAPE.

1. INTRODUCTION
       Trading in stock market indices has gained unprecedented popularity in major
financial markets around the world. However, the prediction of stock price index is a very
difficult problem because of the complexity of the stock market data, and is affected by
many factors including political events, general economic conditions, and investors’
expectations. Modeling the behavior of a market index is a challenging task for several
reasons. There are two major approaches (fundamental and technical) for analyzing stock
price prediction [1].


                                             82
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


        Due to the lack of profound knowledge about interior running rules in nonlinear
systems like stock system, we have no idea about the variables which are more influential
and important and which are not. Input variables are selected only depending on opening
and objective historical data in a stock market. To avoid missing important data
influencing prediction from the historical data, Principal Component Analysis (PCA), is
usually used. A functional principal component technique for the Statistical analysis of a
set of financial time series highlights some relevant statistical features of such related
datasets [3]. This method is to replace original variables with new ones, which are less in
number and not mutually correlative, and contain most of the information of original
variables [6]. Xiaoping Yang [4] used PCA to find the principal components that are
taken as inputs for predicting stock prices using neural network. Variables high, low,
open, volume and adjusted closing were considered for prediction of closing prices using
Hybrid Kohonen Self Organizing Map [5]. Liu et al [7] used the back propagation neural
networks using moving average, deviation from moving average, turnover moving
average, and relative index for prediction. In Versace et al’s work[8], values used are
open, high, low, close and volume of a specific stock while Baba [9] used change of
index, PBR, changes of the turnover by foreign traders, changes of current rates, and
turnover in local stock market. MLP outperformed RBF in predicting weekly closing
prices using the variables open, high, low and volume [10]. In the recent years, Artificial
Neural Networks (ANNs) have been applied to many areas of statistics. One of these
areas is time series forecasting [11-19]. The variables considered in this article for
predicting the daily closing prices are the historic prices, daily opening, low and high
prices of BSE Sensex from 1st January 2009 till 31st March 2010. Principal component
analysis resulted in a single set of variable.
        The closing prices are predicted by fitting a parametric model Simple Linear
Regression and also by classical Non parametric model Theil’s Incomplete Method.
Multilayer Perceptron is another non parametric model that is used to forecast the daily
closing prices taking the principal component as the predictor variable. The forecast error
values are measured which is the difference between the actual value and the forecast
value for the corresponding period all three models. Error values MAPE, SMAPE and


                                                 83
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


MAE are related with how close the forecasted values are to the target ones. Lower the
error values, better is the forecaster.
 2 MODEL DESCRIPTION
 2.1 PRINCIPAL COMPONENT ANALYSIS
        Principal component analysis is appropriate when there are number of observed
variables and wishes to develop a smaller number of artificial variables (called principal
components) that will account for most of the variance in the observed variables. The
principal components may then be used as predictor or criterion variables in subsequent
analyses. Principal component analysis is a variable reduction procedure. It is useful
when there is redundancy in the data obtained on the number of variables. Here
redundancy means that some of the variables are correlated with one another, possibly
because they are measuring the same construct. Because of this redundancy it is possible
to reduce the observed variables into smaller number of principal components that will
account for most of the variance in the observed variables.
        Technically, a principal component can be defined as a linear combination of
optimally weighted observed variables. Below is the general form for the formula to
compute the first component extracted in a principal component analysis:

         C1 = b11(X1)+ b12(X2)+….. b1p(Xp)

Where, C1= the first component extracted

         b1p= the regression coefficient for the observed variable p,

         Xp = the value of the observed variable.
2.2 SIMPLE LINEAR REGRESSION
        Simple linear regression fits a straight line through the set of n points in such a
way that makes the sum of squared residuals of the model as small as possible.
Regression has the following assumptions

    The dependent variable is linearly related to the independent variable.
    Residuals follow normal distribution.
    Residuals have uniform variance.
                                                 84
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


        Regression parameters for a straight line model y = a + bx are calculated by the
least squares method (minimization of the sum of squares of deviations from a straight
line). This differentiates to the following formulae for the slope (b) and the y intercept
(a) of the line




2.3 THEIL’S INCOMPLETE METHOD
        A simple, non-parametric approach to fit a straight line to a set of (x,y)-points is
the Theil's incomplete method which assumes that points (x1, y1), (x2, y2) . . . (xN, yN) are
described by the equation y = a + bx
The calculation of a and b takes place as follows:
    All N data points are ranked in ascending order of x-values.
    The data are separated into two equal size (m) groups, the low (L) and the high (H)
    group. If N is odd the middle data point is not included to either group

    The slope bi is calculated for all points of each group,

          i.e. bi = (yH,I – yL,i)/ (xH,I – xL,i) for i=1,2,…,m

     The median of the m slope values b1, b2, . . ,bm is calculated and it is taken as the
     best estimate of the slope (b) of the line, i.e. b = median(b1, b2, . . bm).
     For each data point (xi,yi) the value of intercept ai is calculated using the
     previously calculated slope b, i.e. ai=yi- bxi for i=1,2,…N

        The median of the N intercept values a1, a2 , . . . aN is calculated and it is taken as
the best estimate of the intercept (a) of the line, i.e. a = median (a1, a2, . .. aN).




                                                 85
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


2.4 MULTILAYER PERCEPTRON.
       A multilayer perceptron is a feed forward network model that maps sets of input
data onto a set of appropriate output. It is a modification of the standard linear perceptron
in that it uses three or more layers of neurons (nodes) with nonlinear activation functions,
and is more powerful than the perceptron in that it can distinguish data that is not linearly
separable. The MLP divides the data set in to three parts Training, Testing and Holdout.
     Training - This segment of data is used only to train the network.
     Testing - This segment of data is a part of the training data to prevent over training
     Hold out - This set of data used to assess the final neural network. Hold out data set
     gives an honest estimate of the predictive ability of the model.
            Multilayer Layer Perceptron has rescaling option which is done to improve
the network training. There are three rescaling options: standardization, normalization,
and adjusted normalization. All rescaling is performed based on the training data, even if
a testing or holdout sample is defined. The activation function of the hidden layer can be
hyperbolic tangent or sigmoid. The units in the output layer can use any one of the
following activation function - Identity, Sigmoid, Softmax or Hyperbolic Tangent.
2.5 ERROR MEASURES
        Error Functions that are used are sum of square error and relative error.
 Sum of square error is defined as the sum of the squared deviation between observed
and the model predicted value. Relative Error is the ratio of an absolute error to the true,
specified, or theoretically correct value of the quantity that is in error

                                                      1 n
                  MeanAverageError, MAE =               ∑ | At − Pt |
                                                      n t =1

                                                                        1 n | At − Pt |
                 MeanAveragePercentError, MAPE =                          ∑ A
                                                                        n t =1    t


                                                           1 n | At − Pt |
                 SymmentricMeanAveragePercentError, SMAPE = ∑              ,
                                                           n t =1 At + Pt

Where At is the actual value and Pt is the predicted value.

                                                 86
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


3 FINDINGS AND RESULTS
        Principal Component Analysis of the variable daily high, low and opening prices
of BSE Sensex data resulted in the single principal component which is further used in
predicting the closing prices by the methods discussed above. The factor determining the
number of principal component, the eigen value and the factor loading of the principal
components are given in Table 1.
                            Table 1 Principal Component Analysis
                                     1             2             3             4

               Eigenvalues        3.2787        0.7194        0.0011        0.0008

               Difference         2.5593        0.7183        0.0003

               Proportion         81.97%        17.98%        0.03%         0.02%

               Cumulative         81.97%        99.95%        99.98%       100.00%




               Criteria:        Kaiser                      Weights




               Factors               F1                        PCA          PCA1

                     V1           0.9855                        V1            0.5442

                     V2           0.9855                        V2            0.5442

                     V3           0.9883                        V3            0.5458

                     V4           -0.5998                       V4           -0.3312

               Exp. Var.          3.2787




                                                 87
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


           Initial Descriptive analysis of the daily closing prices and the predictor variable
(principal component variable) is given in Table 2. The assumptions of simple linear
regression are checked and then with this set of observation the line of regression is
fitted.

                                  Table 2 Descriptive Statistics
Variable                     Mean              Standard Deviation          Skewness        Kurtosis

Daily Closing                14337.1182        3041.62375                  -.788           -.909

Principal Component          23419.2108        4974.61634                  -.785           -.918


                                    Table 3 Tests of Normality

                Kolmogorov-Smirnov                    Shapiro-Wilk

                Statistic   df          Sig.          Statistic df          Significance

Closing         .170        300         .000          .838      300         .000

PCA             .171        300         .000          .838      300         .000


           Durbin Watson value is 2.11 clearly states the absence of autocorrelation.
Normality tests Kolmogorov-Smirnov and Shapiro-Wilk were performed and the outcome
were displayed in Table 3. From the Table 3 it is clear that both the tests imply that the
condition of normality is not met.

          Using method of Least Squares, the Simple Linear Regression Model for the data is
given by

           Y = 34.312 +0.611X,        where X is the principal component variable and Y
represents the daily closing price of BSE. By the classical Nonparametric model Theil’s
method, the model is given by

           Y = 42.15384+0.610456X, where X is the principal component and Y represents
the daily closing price.


                                                 88
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


        For modeling the data with Multilayer Perceptron, the Principal component
variable is taken as covariate and the daily closing prices of BSE is considered to be the
target variable. Smoothing (standardized, normalized and adjusted normalized) of both
the dependent variable and covariates are done successively. All possible combination,
changing the activation function of the hidden layer (hyperbolic tangent and sigmoid) and
that of the output layer (Identity, hyperbolic tangent and sigmoid) the sum of square error
and relative error values are measured with different scaling options.

       The different combinations of the activation function of the output and the hidden
layer with the three rescaling options of the input and target variables resulted in 30
models. The architecture for which the sum of square and relative error was minimum is
the one in which the smoothing of both the dependent and covariates are normal with
hyperbolic tangent as the activation function of the hidden layer and Identity for the
output layer. Table 5 gives the MAE, MAPE, SMAPE and R square values for the above
models discussed above. Figure 1 shows how the models predict the closing prices for the
last 50 data point.

                           Table 6 MAE, MAPE and SMAPE values
   Model                       MAE                  MAPE           SMAPE           R2 Value

   Linear Regression           110.695401           0.0081926      0.0040934       0.9977142

   Theil’s Incomplete          110.6996             0.008198       0.004095        0.9977138

   Method

   Multilayer                  118.5105             0.008839       0.004424        0.9974605
   Perceptron




                                                 89
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME




     Figure 1 shows how the models predict the closing prices for the last 50 data point.
4 CONCLUSION
        The best model for forecasting the daily closing prices was found to be linear
regression. The model yielded the least error, only 0.0081926 on average measured by
the MAPE, 0.0040934 on average measured by SMAPE and 110.695401 as the MAE
value. The R square value is 0.997714272 which indicates that the model is appropriate
in predicting the daily closing prices when the daily opening, high and low prices are
considered for predicting. This model out performed the nonparametric Theil’s method
and MLP models. It will be interesting to conduct further studies to compare the results
with addition variables.
5. REFERENCES
1. Kai Keng Ang and Chai Quek, (2006), “Stock Trading Using RSPOP: A Novel
     Rough       Set-Based Neuro-Fuzzy Approach”, IEEE Transactions of Neural
     Networks, 17(5):1301–1315.
2.   Brabazon. T., (2000) “A connectivist approach to index modelling in financing
     markets”, In Proceedings, Coil / EvoNet Summer School. University of Limerick.
3. Salvatore Ingrassia and G. Damiana Costanzo. (2005), “Functional principal
     component analysis of financial time series”, Vichi M., Monari P., Mignani S.,
     Montanari A. (Eds.) New Developments in Classification and Data Analysis, Pages
     351-358, Springer-Verlag, Berlin.

                                                 90
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


4. Xiaoping Yang (2005), “The Prediction of Stock Prices Based on PCA and BP Neural
     Networks Chinese Business Review, ISSN 1537-1506, USA Volume 4, No.5 (Serial
     No.23), Page 64 – 68.
5. Mark O. Afolabi, Olatoyosi Olude (2007), “Predicting Stock Prices Using a Hybrid
     Kohonen Self Organizing Map (SOM)”, Proceedings of the 40th Hawaii International
     Conference on System Sciences, IEEE.
6. Huixin Ke, Jinghua Huang, Hao Shen (2007), “Statistic Analysis in Investigation and
     Research”, Beijing: Beijing Broadcast University Press, 465-484.
7. Qiong Liu, Xin Lu, Fuji Ren and Shingo Kuroiwa.( 2004), “Automatic Estimation of
     Stock Market Forecasting and Generating the Corresponding Natural language
     Expression”, IEEE Proceedings of the International Conference on Information
     Technology: Coding and Computing.
8. Versace M., Bhatt R., Hinds O. and Shiffer M. (2004), “Predicting the exchange
     traded fund DIA with a combination of genetic algorithms and neural                 networks.”
     Expert Systems with applications, Elsevier.
9.   Baba N., Naoyuki I. and Hiroyuki A. (2000), “Utilization of Neural Networks &
     GAs for Constructing Reliable Decision Support Systems to Deal Stocks.”
     Proceedings of IEEE-INNS-ENNS International Joint Conference on Neural
     Networks.
10. Sujatha K. V. and S. Meenakshi Sundaram, (2010), “A MLP, RBF Neural Network
     Model for Prediction in BSE SENSEX Data Set”, Proceedings of National
     Conference on Applied Mathematics.
11. Katijani, Y., W.K. Hipel and A.I. McLeod, (2005), “Forecasting Nonlinear                 Time
     Series with Feedforward Neural Networks: A Case Study of Canadian Lynx                      Data”.
     Journal of Forecasting, 24: 105-117.
12. Yao, J., Y. Li and C.L. Tan, (2000), “ Option Price Forecasting Using Neural
     Networks”. Omega, 28: 455-466.
13. Chakraborty, K., Merotra K., Mohan C.K. and Ranka S, (1992), “Forecasting the
     Behavior of Multivariate Time-Series Using Neural Network”, Neural Networks,                    5:
     461-470.


                                                 91

More Related Content

What's hot

Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
 
PERFORMANCE ANALYSIS and PREDICTION of NEPAL STOCK MARKET (NEPSE) for INVESTM...
PERFORMANCE ANALYSIS and PREDICTION of NEPAL STOCK MARKET (NEPSE) for INVESTM...PERFORMANCE ANALYSIS and PREDICTION of NEPAL STOCK MARKET (NEPSE) for INVESTM...
PERFORMANCE ANALYSIS and PREDICTION of NEPAL STOCK MARKET (NEPSE) for INVESTM...Hari KC
 
Integration of Principal Component Analysis and Support Vector Regression fo...
 Integration of Principal Component Analysis and Support Vector Regression fo... Integration of Principal Component Analysis and Support Vector Regression fo...
Integration of Principal Component Analysis and Support Vector Regression fo...IJCSIS Research Publications
 
System model.Chapter One(GEOFFREY GORDON)
System model.Chapter One(GEOFFREY GORDON)System model.Chapter One(GEOFFREY GORDON)
System model.Chapter One(GEOFFREY GORDON)Towfiq218
 
A COMPARISON STUDY OF ESTIMATION METHODS FOR GENERALIZED JELINSKI-MORANDA MOD...
A COMPARISON STUDY OF ESTIMATION METHODS FOR GENERALIZED JELINSKI-MORANDA MOD...A COMPARISON STUDY OF ESTIMATION METHODS FOR GENERALIZED JELINSKI-MORANDA MOD...
A COMPARISON STUDY OF ESTIMATION METHODS FOR GENERALIZED JELINSKI-MORANDA MOD...ijseajournal
 
Predicting growth of urban agglomerations through fractal analysis of geo spa...
Predicting growth of urban agglomerations through fractal analysis of geo spa...Predicting growth of urban agglomerations through fractal analysis of geo spa...
Predicting growth of urban agglomerations through fractal analysis of geo spa...Indicus Analytics Private Limited
 
Integrate fault tree analysis and fuzzy sets in quantitative risk assessment
Integrate fault tree analysis and fuzzy sets in quantitative risk assessmentIntegrate fault tree analysis and fuzzy sets in quantitative risk assessment
Integrate fault tree analysis and fuzzy sets in quantitative risk assessmentIAEME Publication
 
Open06
Open06Open06
Open06butest
 

What's hot (16)

Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
 
PERFORMANCE ANALYSIS and PREDICTION of NEPAL STOCK MARKET (NEPSE) for INVESTM...
PERFORMANCE ANALYSIS and PREDICTION of NEPAL STOCK MARKET (NEPSE) for INVESTM...PERFORMANCE ANALYSIS and PREDICTION of NEPAL STOCK MARKET (NEPSE) for INVESTM...
PERFORMANCE ANALYSIS and PREDICTION of NEPAL STOCK MARKET (NEPSE) for INVESTM...
 
Integration of Principal Component Analysis and Support Vector Regression fo...
 Integration of Principal Component Analysis and Support Vector Regression fo... Integration of Principal Component Analysis and Support Vector Regression fo...
Integration of Principal Component Analysis and Support Vector Regression fo...
 
Ijmet 10 01_176
Ijmet 10 01_176Ijmet 10 01_176
Ijmet 10 01_176
 
Xmr im
Xmr imXmr im
Xmr im
 
Q04602106117
Q04602106117Q04602106117
Q04602106117
 
Types of models
Types of modelsTypes of models
Types of models
 
System model.Chapter One(GEOFFREY GORDON)
System model.Chapter One(GEOFFREY GORDON)System model.Chapter One(GEOFFREY GORDON)
System model.Chapter One(GEOFFREY GORDON)
 
A COMPARISON STUDY OF ESTIMATION METHODS FOR GENERALIZED JELINSKI-MORANDA MOD...
A COMPARISON STUDY OF ESTIMATION METHODS FOR GENERALIZED JELINSKI-MORANDA MOD...A COMPARISON STUDY OF ESTIMATION METHODS FOR GENERALIZED JELINSKI-MORANDA MOD...
A COMPARISON STUDY OF ESTIMATION METHODS FOR GENERALIZED JELINSKI-MORANDA MOD...
 
30420140503002
3042014050300230420140503002
30420140503002
 
En36855867
En36855867En36855867
En36855867
 
Predicting growth of urban agglomerations through fractal analysis of geo spa...
Predicting growth of urban agglomerations through fractal analysis of geo spa...Predicting growth of urban agglomerations through fractal analysis of geo spa...
Predicting growth of urban agglomerations through fractal analysis of geo spa...
 
Integrate fault tree analysis and fuzzy sets in quantitative risk assessment
Integrate fault tree analysis and fuzzy sets in quantitative risk assessmentIntegrate fault tree analysis and fuzzy sets in quantitative risk assessment
Integrate fault tree analysis and fuzzy sets in quantitative risk assessment
 
Glm
GlmGlm
Glm
 
Open06
Open06Open06
Open06
 
Final1
Final1Final1
Final1
 

Viewers also liked

Visualization of sorting algorithms using flash
Visualization of sorting algorithms using flashVisualization of sorting algorithms using flash
Visualization of sorting algorithms using flashiaemedu
 
Security issues in cloud computing for msmes
Security issues in cloud computing for msmesSecurity issues in cloud computing for msmes
Security issues in cloud computing for msmesiaemedu
 
Knowledge management strategies in higher education
Knowledge management strategies in higher educationKnowledge management strategies in higher education
Knowledge management strategies in higher educationiaemedu
 
Indian managers in multinational companies and their commitments
Indian managers in multinational companies and their commitmentsIndian managers in multinational companies and their commitments
Indian managers in multinational companies and their commitmentsiaemedu
 
Software process and product quality assurance in it organizations
Software process and product quality assurance in it organizationsSoftware process and product quality assurance in it organizations
Software process and product quality assurance in it organizationsiaemedu
 
Barriers and enablers in implementation of lean six sigma in indian manufactu...
Barriers and enablers in implementation of lean six sigma in indian manufactu...Barriers and enablers in implementation of lean six sigma in indian manufactu...
Barriers and enablers in implementation of lean six sigma in indian manufactu...iaemedu
 
Brand loyalty among consumption of pickle in tamil nadu
Brand loyalty among consumption of pickle in tamil naduBrand loyalty among consumption of pickle in tamil nadu
Brand loyalty among consumption of pickle in tamil naduiaemedu
 
Implementation performance analysis of cordic
Implementation performance analysis of cordicImplementation performance analysis of cordic
Implementation performance analysis of cordiciaemedu
 
Advanced agriculture system
Advanced agriculture systemAdvanced agriculture system
Advanced agriculture systemiaemedu
 

Viewers also liked (9)

Visualization of sorting algorithms using flash
Visualization of sorting algorithms using flashVisualization of sorting algorithms using flash
Visualization of sorting algorithms using flash
 
Security issues in cloud computing for msmes
Security issues in cloud computing for msmesSecurity issues in cloud computing for msmes
Security issues in cloud computing for msmes
 
Knowledge management strategies in higher education
Knowledge management strategies in higher educationKnowledge management strategies in higher education
Knowledge management strategies in higher education
 
Indian managers in multinational companies and their commitments
Indian managers in multinational companies and their commitmentsIndian managers in multinational companies and their commitments
Indian managers in multinational companies and their commitments
 
Software process and product quality assurance in it organizations
Software process and product quality assurance in it organizationsSoftware process and product quality assurance in it organizations
Software process and product quality assurance in it organizations
 
Barriers and enablers in implementation of lean six sigma in indian manufactu...
Barriers and enablers in implementation of lean six sigma in indian manufactu...Barriers and enablers in implementation of lean six sigma in indian manufactu...
Barriers and enablers in implementation of lean six sigma in indian manufactu...
 
Brand loyalty among consumption of pickle in tamil nadu
Brand loyalty among consumption of pickle in tamil naduBrand loyalty among consumption of pickle in tamil nadu
Brand loyalty among consumption of pickle in tamil nadu
 
Implementation performance analysis of cordic
Implementation performance analysis of cordicImplementation performance analysis of cordic
Implementation performance analysis of cordic
 
Advanced agriculture system
Advanced agriculture systemAdvanced agriculture system
Advanced agriculture system
 

Similar to Regression, theil’s and mlp forecasting models of stock index

Novel approach for predicting the rise and fall of stock index for a specific...
Novel approach for predicting the rise and fall of stock index for a specific...Novel approach for predicting the rise and fall of stock index for a specific...
Novel approach for predicting the rise and fall of stock index for a specific...IAEME Publication
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONNandakumar P
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningIRJET Journal
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET Journal
 
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...IRJET Journal
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningIRJET Journal
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTIRJET Journal
 
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...IRJET Journal
 
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMS
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMSMULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMS
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMSijcsit
 
Forecasting S&P 500 Index Using Backpropagation Neural Network Based on Princ...
Forecasting S&P 500 Index Using Backpropagation Neural Network Based on Princ...Forecasting S&P 500 Index Using Backpropagation Neural Network Based on Princ...
Forecasting S&P 500 Index Using Backpropagation Neural Network Based on Princ...Ahmet Kaplan
 
Instruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionInstruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionIAEME Publication
 
Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014ijcsbi
 
Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...
Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...
Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...IJECEIAES
 
Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...IAEME Publication
 
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...IJERA Editor
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelDecentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelSayed Abulhasan Quadri
 
Comparison of the forecasting techniques – arima, ann and svm a review-2
Comparison of the forecasting techniques – arima, ann and svm   a review-2Comparison of the forecasting techniques – arima, ann and svm   a review-2
Comparison of the forecasting techniques – arima, ann and svm a review-2IAEME Publication
 
Comparison of the forecasting techniques – arima, ann and svm a review-2
Comparison of the forecasting techniques – arima, ann and svm   a review-2Comparison of the forecasting techniques – arima, ann and svm   a review-2
Comparison of the forecasting techniques – arima, ann and svm a review-2IAEME Publication
 

Similar to Regression, theil’s and mlp forecasting models of stock index (20)

Novel approach for predicting the rise and fall of stock index for a specific...
Novel approach for predicting the rise and fall of stock index for a specific...Novel approach for predicting the rise and fall of stock index for a specific...
Novel approach for predicting the rise and fall of stock index for a specific...
 
40120130405012
4012013040501240120130405012
40120130405012
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
 
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
Stock Price Prediction using Machine Learning Algorithms: ARIMA, LSTM & Linea...
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLT
 
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
Argument to use Both Statistical and Graphical Evaluation Techniques in Groun...
 
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMS
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMSMULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMS
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMS
 
Forecasting S&P 500 Index Using Backpropagation Neural Network Based on Princ...
Forecasting S&P 500 Index Using Backpropagation Neural Network Based on Princ...Forecasting S&P 500 Index Using Backpropagation Neural Network Based on Princ...
Forecasting S&P 500 Index Using Backpropagation Neural Network Based on Princ...
 
Instruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionInstruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch prediction
 
Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014
 
20120140504019
2012014050401920120140504019
20120140504019
 
Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...
Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...
Initial Optimal Parameters of Artificial Neural Network and Support Vector Re...
 
Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...
 
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelDecentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis Model
 
Comparison of the forecasting techniques – arima, ann and svm a review-2
Comparison of the forecasting techniques – arima, ann and svm   a review-2Comparison of the forecasting techniques – arima, ann and svm   a review-2
Comparison of the forecasting techniques – arima, ann and svm a review-2
 
Comparison of the forecasting techniques – arima, ann and svm a review-2
Comparison of the forecasting techniques – arima, ann and svm   a review-2Comparison of the forecasting techniques – arima, ann and svm   a review-2
Comparison of the forecasting techniques – arima, ann and svm a review-2
 

More from iaemedu

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech iniaemedu
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridiaemedu
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingiaemedu
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationiaemedu
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reorderingiaemedu
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challengesiaemedu
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanismiaemedu
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationiaemedu
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cmaiaemedu
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presenceiaemedu
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineeringiaemedu
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobilesiaemedu
 
Efficient text compression using special character replacement
Efficient text compression using special character replacementEfficient text compression using special character replacement
Efficient text compression using special character replacementiaemedu
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approachiaemedu
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentiaemedu
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationiaemedu
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksiaemedu
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyiaemedu
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryiaemedu
 

More from iaemedu (20)

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech in
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using grid
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routing
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow application
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reordering
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challenges
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanism
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modification
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cma
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presence
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineering
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobiles
 
Efficient text compression using special character replacement
Efficient text compression using special character replacementEfficient text compression using special character replacement
Efficient text compression using special character replacement
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approach
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environment
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow application
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networks
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classify
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imagery
 

Regression, theil’s and mlp forecasting models of stock index

  • 1. International Journal of Computer and Technology (IJCET), ISSN 0976 – 6367(Print), International Journal of Computer Engineering Engineering ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME and Technology (IJCET), ISSN 0976 – 6367(Print) IJCET ISSN 0976 – 6375(Online) Volume 1 Number 1, May - June (2010), pp. 82-91 ©IAEME © IAEME, http://www.iaeme.com/ijcet.html REGRESSION, THEIL’S AND MLP FORECASTING MODELS OF STOCK INDEX K. V. Sujatha Research Scholar Sathyabama University, Chennai E-mail: sujathacenthil@gmail.com S. Meenakshi Sundaram Department of Mathematics Sathyabama University, Chennai E-mail: sundarambhu@rediffmail.com ABSTRACT Financial Forecasting or specifically Stock Market prediction is one of the hottest fields of research lately due to its commercial applications owing to the high stakes and the kinds of attractive benefits that it has to offer. Financial time-series is one of the ‘noisiest’ and ‘non-stationary’ signals present and hence very difficult to forecast. In this paper we have made an attempt to forecast the daily prices of stock index using a Regression, Theil’s and MLP models and the predictive ability of these models are compared using standard error measures. Keywords: Forecasting, Regression, Principal Component, Perceptron, MAPE. 1. INTRODUCTION Trading in stock market indices has gained unprecedented popularity in major financial markets around the world. However, the prediction of stock price index is a very difficult problem because of the complexity of the stock market data, and is affected by many factors including political events, general economic conditions, and investors’ expectations. Modeling the behavior of a market index is a challenging task for several reasons. There are two major approaches (fundamental and technical) for analyzing stock price prediction [1]. 82
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME Due to the lack of profound knowledge about interior running rules in nonlinear systems like stock system, we have no idea about the variables which are more influential and important and which are not. Input variables are selected only depending on opening and objective historical data in a stock market. To avoid missing important data influencing prediction from the historical data, Principal Component Analysis (PCA), is usually used. A functional principal component technique for the Statistical analysis of a set of financial time series highlights some relevant statistical features of such related datasets [3]. This method is to replace original variables with new ones, which are less in number and not mutually correlative, and contain most of the information of original variables [6]. Xiaoping Yang [4] used PCA to find the principal components that are taken as inputs for predicting stock prices using neural network. Variables high, low, open, volume and adjusted closing were considered for prediction of closing prices using Hybrid Kohonen Self Organizing Map [5]. Liu et al [7] used the back propagation neural networks using moving average, deviation from moving average, turnover moving average, and relative index for prediction. In Versace et al’s work[8], values used are open, high, low, close and volume of a specific stock while Baba [9] used change of index, PBR, changes of the turnover by foreign traders, changes of current rates, and turnover in local stock market. MLP outperformed RBF in predicting weekly closing prices using the variables open, high, low and volume [10]. In the recent years, Artificial Neural Networks (ANNs) have been applied to many areas of statistics. One of these areas is time series forecasting [11-19]. The variables considered in this article for predicting the daily closing prices are the historic prices, daily opening, low and high prices of BSE Sensex from 1st January 2009 till 31st March 2010. Principal component analysis resulted in a single set of variable. The closing prices are predicted by fitting a parametric model Simple Linear Regression and also by classical Non parametric model Theil’s Incomplete Method. Multilayer Perceptron is another non parametric model that is used to forecast the daily closing prices taking the principal component as the predictor variable. The forecast error values are measured which is the difference between the actual value and the forecast value for the corresponding period all three models. Error values MAPE, SMAPE and 83
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME MAE are related with how close the forecasted values are to the target ones. Lower the error values, better is the forecaster. 2 MODEL DESCRIPTION 2.1 PRINCIPAL COMPONENT ANALYSIS Principal component analysis is appropriate when there are number of observed variables and wishes to develop a smaller number of artificial variables (called principal components) that will account for most of the variance in the observed variables. The principal components may then be used as predictor or criterion variables in subsequent analyses. Principal component analysis is a variable reduction procedure. It is useful when there is redundancy in the data obtained on the number of variables. Here redundancy means that some of the variables are correlated with one another, possibly because they are measuring the same construct. Because of this redundancy it is possible to reduce the observed variables into smaller number of principal components that will account for most of the variance in the observed variables. Technically, a principal component can be defined as a linear combination of optimally weighted observed variables. Below is the general form for the formula to compute the first component extracted in a principal component analysis: C1 = b11(X1)+ b12(X2)+….. b1p(Xp) Where, C1= the first component extracted b1p= the regression coefficient for the observed variable p, Xp = the value of the observed variable. 2.2 SIMPLE LINEAR REGRESSION Simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model as small as possible. Regression has the following assumptions The dependent variable is linearly related to the independent variable. Residuals follow normal distribution. Residuals have uniform variance. 84
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME Regression parameters for a straight line model y = a + bx are calculated by the least squares method (minimization of the sum of squares of deviations from a straight line). This differentiates to the following formulae for the slope (b) and the y intercept (a) of the line 2.3 THEIL’S INCOMPLETE METHOD A simple, non-parametric approach to fit a straight line to a set of (x,y)-points is the Theil's incomplete method which assumes that points (x1, y1), (x2, y2) . . . (xN, yN) are described by the equation y = a + bx The calculation of a and b takes place as follows: All N data points are ranked in ascending order of x-values. The data are separated into two equal size (m) groups, the low (L) and the high (H) group. If N is odd the middle data point is not included to either group The slope bi is calculated for all points of each group, i.e. bi = (yH,I – yL,i)/ (xH,I – xL,i) for i=1,2,…,m The median of the m slope values b1, b2, . . ,bm is calculated and it is taken as the best estimate of the slope (b) of the line, i.e. b = median(b1, b2, . . bm). For each data point (xi,yi) the value of intercept ai is calculated using the previously calculated slope b, i.e. ai=yi- bxi for i=1,2,…N The median of the N intercept values a1, a2 , . . . aN is calculated and it is taken as the best estimate of the intercept (a) of the line, i.e. a = median (a1, a2, . .. aN). 85
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME 2.4 MULTILAYER PERCEPTRON. A multilayer perceptron is a feed forward network model that maps sets of input data onto a set of appropriate output. It is a modification of the standard linear perceptron in that it uses three or more layers of neurons (nodes) with nonlinear activation functions, and is more powerful than the perceptron in that it can distinguish data that is not linearly separable. The MLP divides the data set in to three parts Training, Testing and Holdout. Training - This segment of data is used only to train the network. Testing - This segment of data is a part of the training data to prevent over training Hold out - This set of data used to assess the final neural network. Hold out data set gives an honest estimate of the predictive ability of the model. Multilayer Layer Perceptron has rescaling option which is done to improve the network training. There are three rescaling options: standardization, normalization, and adjusted normalization. All rescaling is performed based on the training data, even if a testing or holdout sample is defined. The activation function of the hidden layer can be hyperbolic tangent or sigmoid. The units in the output layer can use any one of the following activation function - Identity, Sigmoid, Softmax or Hyperbolic Tangent. 2.5 ERROR MEASURES Error Functions that are used are sum of square error and relative error. Sum of square error is defined as the sum of the squared deviation between observed and the model predicted value. Relative Error is the ratio of an absolute error to the true, specified, or theoretically correct value of the quantity that is in error 1 n MeanAverageError, MAE = ∑ | At − Pt | n t =1 1 n | At − Pt | MeanAveragePercentError, MAPE = ∑ A n t =1 t 1 n | At − Pt | SymmentricMeanAveragePercentError, SMAPE = ∑ , n t =1 At + Pt Where At is the actual value and Pt is the predicted value. 86
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME 3 FINDINGS AND RESULTS Principal Component Analysis of the variable daily high, low and opening prices of BSE Sensex data resulted in the single principal component which is further used in predicting the closing prices by the methods discussed above. The factor determining the number of principal component, the eigen value and the factor loading of the principal components are given in Table 1. Table 1 Principal Component Analysis 1 2 3 4 Eigenvalues 3.2787 0.7194 0.0011 0.0008 Difference 2.5593 0.7183 0.0003 Proportion 81.97% 17.98% 0.03% 0.02% Cumulative 81.97% 99.95% 99.98% 100.00% Criteria: Kaiser Weights Factors F1 PCA PCA1 V1 0.9855 V1 0.5442 V2 0.9855 V2 0.5442 V3 0.9883 V3 0.5458 V4 -0.5998 V4 -0.3312 Exp. Var. 3.2787 87
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME Initial Descriptive analysis of the daily closing prices and the predictor variable (principal component variable) is given in Table 2. The assumptions of simple linear regression are checked and then with this set of observation the line of regression is fitted. Table 2 Descriptive Statistics Variable Mean Standard Deviation Skewness Kurtosis Daily Closing 14337.1182 3041.62375 -.788 -.909 Principal Component 23419.2108 4974.61634 -.785 -.918 Table 3 Tests of Normality Kolmogorov-Smirnov Shapiro-Wilk Statistic df Sig. Statistic df Significance Closing .170 300 .000 .838 300 .000 PCA .171 300 .000 .838 300 .000 Durbin Watson value is 2.11 clearly states the absence of autocorrelation. Normality tests Kolmogorov-Smirnov and Shapiro-Wilk were performed and the outcome were displayed in Table 3. From the Table 3 it is clear that both the tests imply that the condition of normality is not met. Using method of Least Squares, the Simple Linear Regression Model for the data is given by Y = 34.312 +0.611X, where X is the principal component variable and Y represents the daily closing price of BSE. By the classical Nonparametric model Theil’s method, the model is given by Y = 42.15384+0.610456X, where X is the principal component and Y represents the daily closing price. 88
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME For modeling the data with Multilayer Perceptron, the Principal component variable is taken as covariate and the daily closing prices of BSE is considered to be the target variable. Smoothing (standardized, normalized and adjusted normalized) of both the dependent variable and covariates are done successively. All possible combination, changing the activation function of the hidden layer (hyperbolic tangent and sigmoid) and that of the output layer (Identity, hyperbolic tangent and sigmoid) the sum of square error and relative error values are measured with different scaling options. The different combinations of the activation function of the output and the hidden layer with the three rescaling options of the input and target variables resulted in 30 models. The architecture for which the sum of square and relative error was minimum is the one in which the smoothing of both the dependent and covariates are normal with hyperbolic tangent as the activation function of the hidden layer and Identity for the output layer. Table 5 gives the MAE, MAPE, SMAPE and R square values for the above models discussed above. Figure 1 shows how the models predict the closing prices for the last 50 data point. Table 6 MAE, MAPE and SMAPE values Model MAE MAPE SMAPE R2 Value Linear Regression 110.695401 0.0081926 0.0040934 0.9977142 Theil’s Incomplete 110.6996 0.008198 0.004095 0.9977138 Method Multilayer 118.5105 0.008839 0.004424 0.9974605 Perceptron 89
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME Figure 1 shows how the models predict the closing prices for the last 50 data point. 4 CONCLUSION The best model for forecasting the daily closing prices was found to be linear regression. The model yielded the least error, only 0.0081926 on average measured by the MAPE, 0.0040934 on average measured by SMAPE and 110.695401 as the MAE value. The R square value is 0.997714272 which indicates that the model is appropriate in predicting the daily closing prices when the daily opening, high and low prices are considered for predicting. This model out performed the nonparametric Theil’s method and MLP models. It will be interesting to conduct further studies to compare the results with addition variables. 5. REFERENCES 1. Kai Keng Ang and Chai Quek, (2006), “Stock Trading Using RSPOP: A Novel Rough Set-Based Neuro-Fuzzy Approach”, IEEE Transactions of Neural Networks, 17(5):1301–1315. 2. Brabazon. T., (2000) “A connectivist approach to index modelling in financing markets”, In Proceedings, Coil / EvoNet Summer School. University of Limerick. 3. Salvatore Ingrassia and G. Damiana Costanzo. (2005), “Functional principal component analysis of financial time series”, Vichi M., Monari P., Mignani S., Montanari A. (Eds.) New Developments in Classification and Data Analysis, Pages 351-358, Springer-Verlag, Berlin. 90
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME 4. Xiaoping Yang (2005), “The Prediction of Stock Prices Based on PCA and BP Neural Networks Chinese Business Review, ISSN 1537-1506, USA Volume 4, No.5 (Serial No.23), Page 64 – 68. 5. Mark O. Afolabi, Olatoyosi Olude (2007), “Predicting Stock Prices Using a Hybrid Kohonen Self Organizing Map (SOM)”, Proceedings of the 40th Hawaii International Conference on System Sciences, IEEE. 6. Huixin Ke, Jinghua Huang, Hao Shen (2007), “Statistic Analysis in Investigation and Research”, Beijing: Beijing Broadcast University Press, 465-484. 7. Qiong Liu, Xin Lu, Fuji Ren and Shingo Kuroiwa.( 2004), “Automatic Estimation of Stock Market Forecasting and Generating the Corresponding Natural language Expression”, IEEE Proceedings of the International Conference on Information Technology: Coding and Computing. 8. Versace M., Bhatt R., Hinds O. and Shiffer M. (2004), “Predicting the exchange traded fund DIA with a combination of genetic algorithms and neural networks.” Expert Systems with applications, Elsevier. 9. Baba N., Naoyuki I. and Hiroyuki A. (2000), “Utilization of Neural Networks & GAs for Constructing Reliable Decision Support Systems to Deal Stocks.” Proceedings of IEEE-INNS-ENNS International Joint Conference on Neural Networks. 10. Sujatha K. V. and S. Meenakshi Sundaram, (2010), “A MLP, RBF Neural Network Model for Prediction in BSE SENSEX Data Set”, Proceedings of National Conference on Applied Mathematics. 11. Katijani, Y., W.K. Hipel and A.I. McLeod, (2005), “Forecasting Nonlinear Time Series with Feedforward Neural Networks: A Case Study of Canadian Lynx Data”. Journal of Forecasting, 24: 105-117. 12. Yao, J., Y. Li and C.L. Tan, (2000), “ Option Price Forecasting Using Neural Networks”. Omega, 28: 455-466. 13. Chakraborty, K., Merotra K., Mohan C.K. and Ranka S, (1992), “Forecasting the Behavior of Multivariate Time-Series Using Neural Network”, Neural Networks, 5: 461-470. 91