Neural Nets for Stock Indices: Investigating Effect of Change in Hyperparameters

Artificial neural networks have seen an outburst of interest in past decade. There has been an increasing use of ANNs in prediction based studies owing to their huge performance accuracy. They have been successfully applied across various domains like medicine, geology, finance, physics, engineering etc. The system of neural nets witnesses rise in complexity with increase in number of layers and number of neurons and possesses the capacity to solve intricate problems. The researchers, world over, consider the neural network with three layers (input, hidden and output) a universal approximator of functions as it has given outstanding results in data validation, price forecasting, sales forecasting, customer research etc. over the years. In most of the previous studies, either a standard ANN model has been taken or a default model has been tested using various softwares. But as we understand, a lot of hit and trial should be done by altering the hyperparameters to get the best performance model. In our study we attempt to prove the same point and try to find the best model for our data set wherein we predict the BSE sensex closing price of the next day using previous day data (high price, low price, open price, close price and trade volume). We use deep neural networks with backpropagation and have altered the hyperparameters: number of nodes in hidden layers, the activation function of hidden layers, Number of epochs, the batch size and hence the iterations in each epoch. The model performance was measured with the help of root mean square error on test set of the model. We are able to bring out the differences of tuning of hyperparameter and ultimately find the best predictor model for BSE sensex close value.


Introduction
Of late, neural networks have been very popular in the prediction based studies.
They have been successful across various domain areas like medicine, geology, finance, physics, engineering etc.The system of neural nets becomes more complex with increasing number of neurons and hidden layers.But at the same time, these complex models also enhance the prediction ability of complex problems which are otherwise difficult or unsatisfactory.Till now, a three layered neural network has been christened to be a universal approximator of functions and has been used successfully in data validation, price forecasting, sales forecasting, customer research etc. Neural networks are a relatively new concept that has emerged from the field of Artificial intelligence and is now getting used universally in almost all fields owing to their high performance rates.
One of the decade-long paradox in finance and investment is the "prediction of stock market" and that too with full accuracy.There have been a lot of propositions by various renowned researchers on this topic and a lot of models with various combinations of theories have been introduced, yet the puzzle stands.No model has been found to be fully accurate and reliability of models has been a big concern.Sometimes it has been seen that researchers got satisfied with the model that gave a good result and failed to test alternatives, which if considered could have given better results.This was the biggest gap that we found in previous literature/research.Thus in our study, we try to propose machine learning models with varied alternatives and test them.The effect of change of hyperparameters on the model can drastically affect the model performance.We could not trace any significant previous literature on hyperparameter testing and this itself speaks for the uniqueness of our work.By the end of the study, we were able to conclude the best model with high predictability and very less error.This model if used by investors to predict the future sensex closing value, can help them make good profits or at least prevent the losses from unanticipated rupee depreciation.

Deep Neural Networks
Feed forward neural networks are the simplest and the basic types of artificial neural networks.Here the connections between nodes do not form a cycle, rather, the information travels only in one direction which is forward from input layer to the output layer.A single layer perceptron uses step function as activa- ( ) ( ) Since the sigmoid function has a continuous derivative, it allows backpropagation in multilayer perceptrons.
( ) ( ) ( ) ( ) Multilayer perceptron is in fact capable of producing any possible Boolean function.It also satisfies the universal approximation theorem.Thus we also use it in our research.When there are multiple hidden layers, it is called a Deep neural network.More layers mean more processing time but sometimes better results.Large data sets can be churned with efficiency using Deep neural networks (Figure 1).

Back Propagation
It is a popular method which helps in training of neural networks.Backpropagation optimizes the weights in multiple back passes and helps the network of neurons to correctly map the given inputs to their outputs.For illustration let's consider a neural network with one input layer, one hidden layer and one output layer.Here we are talking of supervised learning, and therefore have the data of inputs and their target outputs with us.The whole process starts with forward feeding of input/output data to the neural network input layer.As the input passes Here it gets crushed by the activation function of this layer.The output of this crushing gives the output of the forward pass of inputs through the select neural network.This output is compared to the target output that we want.Depending on the difference between the two, the error is backpassed through the network, starting from the output layer.The weights in the network are updated, so that the output from the neural network gets closer to the target output.The delta rule is used for error backpropagation.
For a neuron "i", with activation function f(x), the delta rule (gradient descent) for i's j th weight w ij is given as follows ( ) ( ) where l is learning rate f(x) is the activation function of neuron "I" in question i t is the target output i y is the produced output j x is the j th input i h is the weighted sum of neuron's inputs Once the error reaches the input layer, the weights of the network have got updated.The output from the network is then again compared to the target output.This process goes on till the error is decreased substantially.There can be various ways to terminate training of the network e.g. when the learning rate decreases beyond a predefined limit, or after a predefined number of epochs, or when the relative change in training error falls a defined limit etc.
Different termination ways of the training algorithm and different settings of hyperparameters can give different results differing in accuracy and efficiency.

Hyperparameter
A hyperparameter is a special class of parameter whose value we set before the beginning of learning process.The value of other parameters is otherwise derived from training.

Activation Functions
Different activation functions can be used in deep neural networks.Different activation functions can be used in different layers (i.e. in hidden layers and output layers).
Unipolar logistic function ( ) This function gives the output between 0 and 1. Bipolar logistic function ( )  This research paper has been organized in sections, where Section 1 introduces the title and the subject under consideration in this research, Section 2 talks about previous works done in the similar areas, the need for the study and the research gap are explained in Section 3 under research objectives.Section 4 gives a detailed structure of methodology used in this research followed by Section 5, which explains all the different models with the variations of hyperparameters and the results of all these variations.At the end of Section 5, the best fit model for our research BSE Sensex is described.Finally, Section 6 concludes the paper.

Literature Review
Some noteworthy researches in the area of financial modelling using machine learning have been done over past few decades.Few of them are worth mentioning.A pioneering work was done by Kimoto et al. [1] wherein they applied modular neural networks to the price indices data of Tokyo stock exchange and developed a prediction system for best time of stock buying and selling and achieved accurate predictions.The simulation showed excellent profits.The research done by Ghiassi and Saidane [2] was commendable since they designed a new model of ANN where in they used the entire data set simulataneously for model parameter estimation.The model was appraised using marketing data set and compared it with traditional feed forward methods.The new model was found to perform better.Ghiassi et al. [3] compared the traditional iterative back propagation feed forward method of time series forecasting with the dynamic model of neural network and established the supremacy of the latter method.
One of the highly acclaimed research was performed by Chang et al. [4].They S. Agarwal, J. A. Khan DOI: 10.4236/tel.2019.93036516 Theoretical Economics Letters developed an integrated system for stock forecasting in which neural network, case based reasoning and dynamic time windows were combined.The prediction of sell/buy deciding points was found to be better than with any of the three methods used alone.Hamzacebi et al. [5] compared two methods (direct and iterative) of artificial neural network for time series forecasting.In iterative method one period value is forecasted from the past period one, and then this value is used to predict the next period value.In the direct method all the values of successive periods can be predicted in one go.The researchers compared the two methods using grey relational analysis and found that direct method was better than the iterative method.An innovative empirical study was attempted by Cheng et al. [6] wherein they used fundamental and technical analysis and intergrated them with artificial neural network system and set theory to develop market timing investment strategy model.Forecasting accuracy and returns from investments were used for evaluating the model.Liao and Wang [7] studied the fluctuations of Chinese stock Index and make an improvised forecasting neural network model by introducing stochastic time effective function.They suggest that the closer is the time of the past data is to the current time, the stronger is its effect on the prediction model.The model is also appraised by different parameters of volatility.In another research by Guresen et al. [8], the researchers tried to cut through traditional linear and nonlinear approaches to forecast stock market rates and analysed three new models: Multi-layer perceptron (MLP), Dynamic artificial neural network (DAN) and a Hybrid nuerual network.The Mean square error used for appraisal of models showed that the MLP model gave the best predictions when used on the same data set.Moghaddam et al. [9] investigated the stock forecasting ability of artificial neural network using NASDAQ stock exchange.Two types of input sets four prior days and nine prior days were used, although both methods were found to be equally meritorious.
In most of the studies, either a standard ANN model has been taken or a default model has been tested using various softwares.But as we understand, a lot of hit and trial should be done by altering the hyperparameters to get the best performance model.In our study we attempt to prove the same point and try to find the best model for our data set.

Research Objectives
In this research, we wanted to see the effect of change of some hyperparameters on the model's prediction ability and efficiency.The testing is useful as it points out the effect of taking some hyperparameters as default and getting the results without realizing that the model could be tuned for further better results.

Research Methodology
We took the daily data of BSE Sensex from yahoo finance website for a time pe-

Analysis and Results
Case 1.Effect of Change in number of hidden layers and the activation function of hidden layers In our first variation, we changed the number of hidden layers to 1 and 2. The number of nodes in the single hidden layer were taken as 3 while in the model with two hidden layers, the first layer had 3 nodes and the second had 2 nodes.
The number of layers was restricted to 2 as on further increasing, the test error was getting very high.Also, for both the model types (two variations in number of hidden layers) we tested two activation functions for the layers, first one was the sigmoid function and the second one was the hyperbolic tangent function.
The performance metrics of the (four total) models is shown in Table 1.
The errors of prediction are elaborated graphically in Figure 10.Figures      can also be seen to be low.Overall it is a good model with less error in testing.
It can be seen from Figure 7 that the predicted values and the actual values of closing sensex coincide for most of the study period.Also from ).Also from the graph is messy since the scatter plot shown in Figure 4 shows huge scatter between residuals and predictions.Thus the model is not appropriate for the prediction in current research.Table 1 shows that this model has high SSE in training process.This means that there is requirement to increase the number of hidden layers, or number of hidden neurons or both.To test this we check another model seen in Figure 5 & Figure 9.
When we see Figure 9 along with Figure 8, we can see the difference.There is a lot of improvement in the predictability of model with two hidden hyperbolic tangent layers (3,2).The training and test error show drastic reduction from the previous model where there was only one hyperbolic tangent layer.Also, the mismatches seen are less (less blue visible lines in above graph).The most prominent deviation seen is from June 2018 to Nov 2018.Though this model is appraised to be good, but the training time taken is more.
As can be seen from the Figure 10, both the test and train error are low in ANN with sigmoid function activated hidden layer.Also, the model with sigmoid function gives best result with one hidden layer with three hidden neurons.The results of the performance of models with different batch size iterations is explained through graph in Figure 11.
The batch size was restricted to 50 and no further increase was reported as the error was increasing at exponential rates.
From Figure 11, it can be seen that the least error for both training set and the test set is in 10 records.Also from Table 2, it can be seen that the training time of 0.41 seconds is also comparable with 20 and 30 batch size training time.
Hence 10 records is the best size to train the current data set.
Case 3. Effect of change in number of nodes in one hidden layer In this case, we varied the number of nodes (neurons) in the hidden layer.The number of units were 192 and the number of hidden layers was 1 (taken as default for case 3 variations).The activation function of hidden layer was sigmoid.
The output activation was linear.The hidden nodes varied being 3, 10, 20 and 30 in different models tested.On further increase in number of nodes, the error increased immensely.
The performance results of case 3 are explained graphically in Figure 12.From Figure 12 it can be seen that the training error increases with the number of increasing nodes in the hidden layer.The test set error remains more or less same when the no. of hidden units is 3, 10 and 20.It increase when the no.
of nodes is increased to 30.Thus the best model is one with 3 hidden units (nodes) where both the test and train errors are less.Also, from Table 3 it can be seen that the training time is also least when the number of nodes is 3.  4).
The performance of case 4 variations is explained further through graph in     number of hidden units (nodes) in the single hidden layer from 10, 20, up to 50.
Further increase was giving high errors.The performance statistics are reported in Table 5.
The results of variations done in case 5 are explained through graph in Figure 14.   6.
From the above statistics (Table 7), we can see that closing value of the previous tion function and uses delta rule for training of neurons.Its drawback is that it cannot learn a XOR function.But an improvement of it, called the multilayer perceptron (MLP) is capable of processing XOR function and computing a con-S.Agarwal, J. A. Khan DOI: 10.4236/tel.2019.93036513 Theoretical Economics Letters tinuous output.Here the activation function is commonly logistic function (also known as a sigmoid function).

Figure 1 .
Figure 1.Structure of a deep neural network.(Source: Researcher's own work)

riod 1
January 2014 to 31 December 2018.The daily data contained volume traded, the high price, low price, close price and open price.Raw data was cleaned for any missing values and standardized using Z scores.Input-mean Z score variance = A total of 1231 readings (observations) were retrieved for analysis using deep neural networks.Cases with user missing values on factors and categorical dependent variables were excluded during analysis.We used SPSS for analysing the data and constructing different networks.The input or independent variables consisted the high price, low price, open price, close price and the volume traded.The dependent variable or the output value was the closing value of index (labelled closenext here) on the successive day.We set the data partitions as 60% training set, 20% validation set and another 20% as test set.We used batch training for each epoch so as to see the effect of change in iterations on the prediction capacity.The training momentum was set at default value of 0.9.The maximum training time was set to 15 minutes (for the worst scenario if the termination criteria is not reached).The stopping criteria was 6 consecutive training steps with no change in training error or relative change of 0.0001 in training error achieved.Both training and test data were used to compute prediction errors.The learning rate was set at 0.4 and the lower boundary of learning rate was fixed to 0.001.Gradient descent was used for backpropagation.The hyperparameters we changed in different models were: • The number of hidden layers • Number of nodes in hidden layers • The activation function of hidden layers • Number of epochs • The batch size and hence the iterations in each epoch The model performance was measured with the help of root mean square error on test set of the model.Both the training and test data sets were used for computing prediction error.Error CalculationIn the algorithm that we used to train our neural networks, we set aside 20% data set for crossvalidation step to measure the error of each iteration and help in gradient descent of error.Test error and the training errors are calculated separately in each constructed neural network.We calculate SSE i.e. sum of square error and RE i.e. relative error for both sets.SSE gives an indication of the RMSE (root mean square error) which is by far the most reliable method to measure performance of a neural network.The lesser the error in the test set, the better the network.As a rule of thumb, if the training error is more, we increase the number of neurons in the hidden layer or the number of hidden layers.If the training error is satisfactory, but test error is more, we presume that the training has led to over-fitting, and therefore we reduce the number of neurons in the hidden layer or the number of hidden layers.S. Agarwal, J. A. Khan DOI: 10.4236/tel.2019.93036518 Theoretical Economics Letters

The training time in 3
unit single sigmoid activated layer model has less training time of 0.41 seconds with good performance.Case 2. Effect of change of training batch sizeThe batch size used for one iteration was varied to see the effect on model performance.We took the number of records as 10, 20, 30 and 50 for a composition of a batch in successive variations of model.The number of units were 192 and the number of hidden layers was 1 (taken as default setting for case 2 variation).The number of nodes in hidden layer was 3. The activation function used was sigmoid.The output activation function was linear.

Figure 10 .
Figure 10.Effect of change in number of hidden layers and change in hidden layer activation function.(Source: Reasearcher's analysis of data)

Figure 12 .
Figure 12.Effect of change in no. of hidden units (nodes) in the single hidden layer.(Source: Reasearcher's analysis of data)

Case 4 .
Effect of change in number of training epochsIn case 4, we attempted to study the effect of change in number of training epochs.One epoch is defined as one pass of the complete data set through the neural network.The number of units were 192 and the number of hidden layers was 1 (taken as default for case 4 variation analysis).The number of nodes in hidden layer was 3. The activation function used was sigmoid.The output activation was linear.We tested a varied number of epochs (starting from 10, 20, … upto 100).There was change in error at each change and also the training time varied across all cases (Table

Figure 13 .From
Figure 13.From Figure 13 it can be seen that both training and test error are less when the number of epochs used for training are 10, 75 and 80.But the least error is when the number of epochs used is 10.Also the time taken for training is less when seen along with the error rate.Thus the best model for our data set in this research is the one using 10 epochs for training.This is suggested keeping in mind both the model error and the training time taken.Case 5. Using radial basis function We tried to use radial basis function network in place of multilayer perceptron feed forward model to check if it gives a better result.In radial basis function, the number of hidden layers is always 1. Hidden layer's number of nodes is varied.The activation function of hidden layer is softmax function.We varied the

Figure 13 .
Figure 13.Effect of change in no. of training epochs.(Source: Reasearcher's analysis of data)

Figure 14 .
Figure 14.Effect of changing no. of hidden units in radial basis function model.(Source: Reasearcher's analysis of data)

Table 1 .
Performance metrics of variation in number of hidden layers and its activation functions.
SSE = sum of square error, RE = Relative error (Source: Reasearcher's analysis of data).

Table 1
between the actual and the predicted closing sensex values.The error rate in both training and testing is high.The disagreement patches are also large and clear (seen in big blue colour lines around January 2014-May 2014, June 2018-Nov 2018 etc.

Table 2 .
Performance metrics of variation in batch size used for training.

Table 3 .
Performance metrics of variation in number of hidden layer nodes.

Table 4 .
Performance metrics of variation in number of training epochs.
SSE = sum of square error, RE = Relative error (Source: Reasearcher's analysis of data).

Table 5 .
Performance metrics of variation in number of nodes in hidden layer when RBF function model is used.= sum of square error, RE = Relative error (Source: Reasearcher's analysis of data). SSE