Stock Price Prediction Based on the Bi-GRU-Attention Model

Abstract

The stock market, as one of the hotspots in the financial field, forms a data system with a huge volume of data and complex relationships between various factors, making stock price prediction an area of keen interest for further in-depth mining and research. Mathematical statistics methods struggle to deal with nonlinear relationships in practical applications, making it difficult to explore deep information about stocks. Meanwhile, machine learning methods, particularly neural network models and composite models, which have achieved outstanding results in other fields, are being applied to the stock market with significant results. However, researchers have found that these methods do not grasp the essential information of the data as well as expected. In response to these issues, researchers are exploring better neural network models and combining them with other methods to analyze stock data. Thus, this paper proposes the ABiGRU composite model, which combines the attention mechanism and bidirectional gated recurrent unit (GRU) that can effectively extract data features for stock price prediction research. Models such as LSTM, GRU, and Bi-LSTM are selected for comparative experiments. To ensure the credibility and representativeness of the research data, daily stock price indices of BYD are chosen for closing price prediction studies across different models. The results show that the ABiGRU model has a lower prediction error and better fitting effect on three index-based stock prices, enhancing the learning efficiency of the neural network model and demonstrating good prediction stability. This suggests that the ABiGRU model is highly adaptable for stock price prediction.

Share and Cite:

Zhang, Y. and Tumibay, G. (2024) Stock Price Prediction Based on the Bi-GRU-Attention Model. Journal of Computer and Communications, 12, 72-85. doi: 10.4236/jcc.2024.124007.

1. Introduction

The stock market, as a highly complex and volatile financial domain, has always attracted substantial interest from investors and researchers. Accurately predicting the trends of stock price fluctuations holds significant importance for investors. It not only impacts the effectiveness of investment decisions but also correlates with the success of asset appreciation and risk management. However, the stock market is influenced by numerous factors, making the relationships within stock data more complex and presenting unpredictable nonlinear data forms. The research on this type of data using statistical methods has reached a bottleneck. Based on this, there has been a search for new methods to study stock data. In recent years, artificial intelligence (AI) technology has rapidly developed alongside the information age. Computer technology, with its high-speed processing capabilities, has not only improved human life and work efficiency but also formed many efficient and classical mathematical models, reaching stages where it simulates human thinking, learning, and aids in prediction and decision-making. Scholars have attempted to use AI-related technologies to analyze and predict stocks, achieving significant results, which has led to an increasing number of experts and scholars focusing on stock research based on computer-related technologies [1] . Among these, neural network models have been widely used and achieved excellent results in many fields. Therefore, scholars have applied various neural network methods to stock price prediction, and empirical studies have improved the accuracy of stock price predictions. With the in-depth research of neural network methods in various fields, some shortcomings have been discovered: on the one hand, some neural network models cannot deeply connect the relationship between sequential data, and at the same time, the grasp of key information is not high enough, leading to weak interpretability of the model. Although more and more scholars seek to optimize neural network models to solve some problems, others have attempted to combine models that can highlight key data points to enhance prediction effectiveness, further refining and developing the field of stock price prediction [2] .

In recent years, the attention mechanism in computer mechanisms has almost sparked a new wave of interest in the academic world, especially in combination with neural networks, effectively addressing problems within neural networks. In fact, as early as the last century, the attention mechanism was proposed in the field of computer vision, but at that time, its application was relatively limited. With the continuous improvement of neural network methods, this mechanism has flourished in the field of machine translation. The attention mechanism can effectively relate the relationships between input data and eliminate complex network structures, making the model more streamlined and efficient. Subsequently, attempts have been made to apply this mechanism to popular fields such as natural language processing and image recognition, achieving significant results. Scholars have also started applying it in more areas, such as electrical and power fields, and it has gradually shown remarkable success in stock price prediction. The attention mechanism tends to capture important information, similar to the human visual mechanism, focusing on important information while ignoring unnecessary factors. Mainly, under limited computing power, the attention mechanism allocates more resources and computing power to important tasks in the data.

This article will explore the application of machine learning methods in stock prediction and focus on the emerging and promising model of Bi-GRU with an attention mechanism, hoping to provide investors and researchers with a deeper understanding and inspiration, promoting further development in the field of stock prediction.

2. Stock Prediction Based on ABiGRU

2.1. Long Short-Term Memory Neural Network

To solve the problems of gradient vanishing and gradient explosion that occur in RNNs when processing long sequence data, LSTM improves upon the RNN by introducing a gate mechanism, which allows it to handle long sequence data [3] [4] . By normalizing the data to a certain range through an activation function, LSTM avoids the problems of gradient vanishing and gradient explosion during backpropagation [5] . When unfolded according to time steps, LSTM is similar to RNN and consists of a series of memory cells. Its structure is shown in Figure 1.

Figure 1. LSTM neural network structure.

As shown in Figure 1, the three gates introduced in LSTM are the input gate i t , the forget gate f t , and the output gate o t . It also introduces the cell state C t as the long-term memory layer and the candidate state C ˜ t as a temporary storage for information to be stored in the long-term memory. These three gates are functions of the previous time step’s output h t 1 and the current time step’s input x t , and their calculation formulas are as follows:

i t = σ ( W i [ h t 1 , x t ] + b i ) (1)

f t = σ ( W f [ h t 1 , x t ] + b f ) (2)

o t = σ ( W o [ h t 1 , x t ] + b o ) (3)

where the input gate i t determines the proportion of information that will be stored in the current cell state, the forget gate f t selectively forgets information in the cell state, and the output gate o t selectively outputs information from the cell state. In the above three formulas, W i , W f , W o are parameter matrices that need to be learned, and b i , b f , b o are biases. σ represents the sigmoid function, which constrains the range of the gates to distribute between 0 and 1.

In Figure 1, h t represents the memory cell, which belongs to short-term memory and is obtained through the output gate from the current input x t and long-term memory C t :

h t = o t tanh ( C t ) (4)

C ˜ t represents the candidate state, used to store learned new knowledge, and will be stored in the cell state long-term memory C ˜ t later. It is obtained by applying an activation function to the current input feature x t and the short-term memory h t 1 from the previous time step:

C ˜ t = tanh ( W c [ h t 1 , x t ] + b c ) (5)

C t represents the long-term memory, which is the sum of the previous long-term memory C t 1 multiplied by the forget gate value and the new knowledge C ˜ t induced from the current time step multiplied by the input gate value:

C t = f t C t 1 + i t C ˜ t (6)

Using the above structure, the output corresponding to each moment can be obtained, then the output dimension is transformed, input into a fully connected layer, and finally, the prediction result is output. However, since LSTM can only process data in a sequence from the beginning to the end and cannot integrate the influence of subsequent data information, there are certain limitations in the use of LSTM.

2.2. GRU Recurrent Neural Network

GRU, which stands for Gated Recurrent Unit, is a type of recurrent neural network [6] [7] . It was proposed by Cho et al. to solve the problems of gradient vanishing and gradient explosion in long-term recurrent neural networks, and uses a gate mechanism to control the flow of information. It is an improved recurrent neural network model. In tasks such as natural language processing, speech recognition, and time series analysis, GRU has achieved very good performance and has been widely used in fields such as text generation, sentiment analysis, machine translation, and speech recognition. It is also faster in model training and inference speed than traditional RNNs, so it is also widely used in real-time data inference and other application scenarios. The basic structure of the GRU consists of the reset gate r t , update gate z t , and output gate h t . The diagram of the GRU model is shown in Figure 2.

The key to the GRU network lies in the design of its reset gate and update gate. The reset gate decides which historical information can be forgotten, while the update gate remembers past information to perform the current task and decides how much information from previous time steps needs to be passed on to future steps. Through the control of these two gates, the network can precisely manipulate the flow of information to get the desired output. The gate structure allows the network to decide whether to discard or retain information, and to decide what kind of information the output should pass. The calculation formulas for GRU are shown in Equations (7) to (10):

r t = σ ( W r [ h t 1 , x t ] ) (7)

z t = σ ( W z [ h t 1 , x t ] ) (8)

h ˜ t = tanh ( W h [ r t h t 1 , x t ] ) (9)

h t = ( 1 z t ) h t 1 + z t h ˜ t (10)

x t represents the input at time step t , h t 1 represents the hidden state from the previous time step, z t and r t represent the values of the update gate and reset gate, h ˜ t represents the candidate new hidden state, W r , U r , b r , W z , U z , b z , W r , U r and b r are learnable parameters. represents element-wise multiplication, σ represents the sigmoid function, and tanh represents the hyperbolic tangent function. The calculation of the GRU network mainly involves five steps:

1) Initialization: h 0 = 0 .

2) Calculate the reset gate: r t = σ ( W r [ h t 1 , x t ] ) . Here, W r is a weight matrix of size h × ( h + d ) , [ h t 1 , x t ] represents the concatenation of h t 1 and x t to form a vector of size h + d , and the sigmoid function is applied for activation.

Figure 2. GRU model diagram.

3) Calculate the update gate: z t = σ ( W z [ h t 1 , x t ] ) . Here, W z is a weight matrix of size h × ( h + d ) , [ h t 1 , x t ] represents the concatenation of h t 1 and x t to form a vector of size h + d , and the sigmoid function is applied for activation, similar to the reset gate calculation step.

4) Calculate the candidate hidden state: h ˜ t = tanh ( W h [ r t h t 1 , x t ] ) . Here, represents element-wise multiplication, W h is a weight matrix of size h × ( h + d ) .

5) Calculate the new hidden state: h t = ( 1 z t ) h t 1 + z t h ˜ t . Here, represents element-wise multiplication. In this step, h t 1 and h ˜ t are weighted averaged based on the values of the reset gate and update gate to obtain the new hidden state h t . Repeat steps (2)-(5) until the entire input data sequence is processed. Transform the final result based on the dimensions of the output samples.

2.3. Bi-GRU-Attention Neural Network

2.3.1. Temporal Attention Mechanism

In stock market issues, it is known from the field of economics that the influence range between stocks can often be as long as 5 days or even 20 days. When collecting data, usually one data point is collected every day, which leads to each sampling point being influenced by as many as dozens of neighboring stock data. Therefore, in the stock process, the sequence of time steps becomes longer. The closer the adjacent stock data is to the stock to be predicted, the greater the impact factor on the stock to be predicted [8] . In order to simulate the relationship of stocks more realistically, this paper introduces an attention mechanism module. The Attention mechanism assigns different weights to the feature vectors extracted at each time step, paying enough attention to important time step data points, reducing the impact of irrelevant information, and highlighting key features. After the BiGRU module extracts features and analyzes the temporal relationship of the input data [9] , it combines with the attention mechanism, and its specific structure is shown in Figure 3:

Figure 3. Attention mechanism module.

The attention mechanism layer mainly consists of three parts: feature similarity calculation layer, importance weight allocation layer with weight softmax, and feature reconstruction layer. The feature similarity calculation layer explores the correlation between known different layer depth sampling data and the target predicted layer by taking the outputs h t 1 , h t , and h t + 1 of the residual module as inputs, and outputs the correlation coefficient r :

r = a c t i v a t i o n ( W x k + b ) (11)

where activation is the activation function, and W x k and b are trainable parameters of the attention mechanism module.

The attention network uses function softmax to perform attention scoring operations, converting the correlation coefficients obtained from the output of the previous layer into weights, i.e., the state signal weight matrix a k , indicating the importance of known well log curves for the missing curves.

a k = exp ( r ) k exp ( r ) (12)

Then, the input stock data is weighted accordingly, different weights are used for data at different time steps, and the weighted stock data is used as the input of the fully connected layer for prediction.

2.3.2. Introduction to Attention-Based BiGRU Model

The BiGRU network model introduces the attention mechanism module, as shown in Figure 4. Here, X 0 , X 1 , X t 1 , and X t represent the inputs at different time steps; BiGRU is a bidirectional GRU model that utilizes two GRU

Figure 4. ABiGRU model structure diagram.

models with opposite directions to extract stock data features; h 0 , h 1 , h t 1 , and h t represent the outputs at different time steps; a 0 , a 1 , a 2 , a t 1 , and a t represent the results calculated by the attention mechanism model after weighting; and c t represents the output. The data first passes through the input layer, then utilizes BiGRU to extract stock data features, returning the results for each time step. Subsequently, the attention mechanism model assigns weights to each time step’s data, and finally, the prediction results are calculated through the fully connected layer and propagated backward for gradual optimization of the error.

3. Research Block Experiments

3.1. Data Preparation

The dataset used for the test includes a total of 2638 pieces of data from BYD Company from 2011.6.30 to 2022.5.20. The training dataset was used from 2011.6.30 to 2019.2.18, and the Close curve from 2019.2.19 to 2022.5.27 was used to validate the model. Take 30 as the length of the training sample. The input variables include Open, Close, Volume, High, Low, Previous, and the output variable is Close [10] . The visualization of the closing price is shown in Figure 5.

3.2. Model Evaluation Metrics

The evaluation criteria for the model are MSE, RMSE, and MAE. MSE stands for mean squared error, the accuracy of prediction increases as its value decreases.

The formula is as follows:

MSE = 1 N t = 1 N ( f i y i ) 2 (13)

RMSE is the root mean square error. The formula is as follows:

RMSE = 1 N t = 1 N ( f i y i ) 2 (14)

MAE is the average of the absolute errors. The formula is as follows:

MSE = 1 N t = 1 N | f i y i | (15)

f i represents the predicted value, and y i represents the true value.

Figure 5. Visualization of closing price.

3.3. Stock Correlation Analysis

Before establishing the model, it is necessary to conduct correlation analysis on the stock-related curves. Here, the Pearson correlation coefficient in statistics is used to measure the correlation between two variables. The response values of each curve are treated as matrices. The matrices are named as X o p e n and X C l o s e , with their means denoted as X O p e n ¯ = i = 1 n ( X O p e n ) / n and X C l o s e ¯ = i = 1 n ( X C l o s e ) / n , respectively. The corresponding Pearson correlation coefficient is calculated as follows:

r ( X O p e n , X C l o s e ) = i = 1 n ( X O p e n X O p e n ¯ ) ( X C l o s e X C l o s e ¯ ) { i = 1 n ( X O p e n X O p e n ¯ ) 2 i = 1 n ( X C l o s e X C l o s e ¯ ) 2 } 1/2 (16)

where n is the length of each column, r ( X O p e n , X C l o s e ) represents the correlation coefficient of the curves, and X O p e n and X C l o s e range from −1 to 1, where 1 and −1 represent complete positive correlation and complete negative correlation, respectively, while 0 represents no relationship between the two attributes.

The correlation calculation results between the 6 stock curves are shown in Figure 6. As can be seen from the figure, Open, Close, Volume, High, Low, Previous, etc., have a high correlation with Close. Subsequently, the data was used for the experiment.

Figure 6. Stock curve correlation heatmap.

3.4. Experimental Comparative Analysis

Stock prediction models based on LSTM, GRU, Bi-LSTM, and ABiGRU networks were established. To improve the accuracy and efficiency of the models, the training data was normalized and batch training was adopted. The training process was conducted using a GPU, with a batch size of 100, meaning that 100 sets of training data were selected each time. The length of the training samples was consistently 30.

Figure 7 shows the training process of the stock prediction models based on LSTM, GRU, Bi-LSTM, and ABiGRU networks. From Figure 7, it can be observed that the loss function of each model decreases rapidly at the beginning with the increase of iterations, and then gradually converges and becomes smooth. Throughout the process, there is no occurrence of overfitting where the loss function increases after reaching a stable state. The final models were obtained after 200 iterations.

Figure 7. Change of loss function with iterations.

From Figure 7, it can be seen that the loss function error of the stock prediction models based on LSTM, GRU, Bi-LSTM, and ABiGRU networks continues to decrease during training. To fully verify the predictive ability of each model for stock prediction, the Close curve from 2019.2.19 to 2022.5.27, was selected as the validation model. In the process of using different models for stock prediction, the preceding part of the stock data to be predicted was used as warm-up data, and then the stock data prediction was completed by continuously iterating and updating. The results of stock prediction using different models are shown in Figure 8:

From Figure 8, it can be observed that when models such as LSTM, GRU, Bi-LSTM, and ABiLSTM are used for stock prediction, they can capture the overall trend of stock price changes, indicating that time series models have a natural advantage in handling stock data. Next, a detailed comparison is conducted. Firstly, comparing the reconstruction results of LSTM and GRU, no significant improvement is observed except for a slight improvement in the range from index 300 to 450, indicating that the accuracy of stock prediction using GRU is comparable to LSTM, and the structural improvement only enhances the prediction speed of the model. Then, comparing LSTM, GRU, and Bi-LSTM, it is found that the prediction results are significantly improved. Bi-LSTM extracts features by utilizing two LSTMs with opposite directions and then fuses the features, considering the bidirectional relationship between stock data before and after. Although Bi-LSTM achieves significant improvement in prediction accuracy, it increases the complexity of the network and reduces efficiency. Furthermore, Bi-LSTM cannot learn the strength of the mutual influence between stock data at different time steps. Finally, comparing LSTM, GRU, Bi-LSTM, and

Figure 8. Stock prediction results of different models.

ABiGRU, it is found that ABiGRU utilizes the attention mechanism to learn the strength of mutual influence between stock data at different time steps, improving the robustness of the model. Moreover, the main network using BiGRU enhances the speed of the model.

4. Experimental Results and Discussion

To evaluate the accuracy of LSTM, GRU, Bi-LSTM, and ABiGRU models in predicting stock data, experiments were conducted on BYD stock data using the aforementioned models. The models analyzed the stock data features and compared the prediction results, as shown in Table 1. The data in the table represents the mean squared error, root mean squared error, and mean absolute error of the estimated stock data for each method. A smaller average mean squared error indicates more accurate prediction results, while smaller root mean squared error and absolute error indicate stronger model stability.

From the experiments and Table 1, it can be observed that both LSTM and GRU are unidirectional time series models that extract stock data features using a single-directional network. Based on the evaluation data, both LSTM and GRU achieve good results in stock data prediction, with slightly lower prediction error for GRU compared to LSTM. Although LSTM and GRU perform well in stock prediction, they have certain limitations as they only consider the influence of the preceding half of the current sampling point and neglect the latter half. To improve the model’s robustness, Bi-LSTM (Bidirectional Long Short-Term Memory) was introduced, which uses two LSTMs with opposite directions to extract stock data features and further improve the model’s accuracy. However, the increased complexity of the model slows down the prediction speed, making it unsuitable for real-time stock prediction. Additionally, Bi-LSTM is unable to effectively extract the strength of the mutual influence between stock data at different time steps. Therefore, this study proposes ABiGRU, which utilizes a BiGRU (Bidirectional Gated Recurrent Unit) as the main network, reducing network complexity and ensuring prediction speed. Furthermore, an attention mechanism module is added to the main network to allocate corresponding weights to stock data at different time steps. The experimental results demonstrate that the proposed ABiGRU model achieves better performance in stock prediction. Moreover, the model exhibits greater stability in terms of robustness and generalization compared to other models.

Table 1. Evaluation of generated artificial well log curve model data by different models.

5. Conclusion

This paper proposes a stock data prediction method based on the ABiGRU neural network. Stock market data is typically time-series data, where past stock prices and trading volumes have a certain impact on future price trends. Bi-GRU is a bidirectional recurrent neural network that can utilize both past and future information for prediction. This bidirectional information flow can more comprehensively perceive patterns and relationships in the data, improving the predictive accuracy of the model. Additionally, the gating mechanism in Bi-GRU helps alleviate the vanishing gradient problem, allowing the model to handle long-term dependencies and better capture long-term trends in stock prices. Adding an attention mechanism module to the Bi-GRU model allows the model to focus on specific time periods or features that have a larger impact on the stock price trend during prediction. Through the attention mechanism, the model can dynamically learn weights, pay more attention to important information, and improve prediction accuracy. By applying the method to BYD stock data prediction and comparing it with LSTM, GRU, and Bi-LSTM models, the experimental results indicate that the ABiGRU model has high accuracy in stock prediction.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Peng, Z., et al. (2019) Stock Analysis and Prediction Using Big Data Analytics. 2019 International Conference on Intelligent Transportation Big Data & Smart City (ICITBS), Changsha, 12-13 January 2019, 309-312.
https://doi.org/10.1109/ICITBS.2019.00081
[2] Gupta, A., Bhatia, P., Dave, K. and Jain, P. (2019) Stock Market Prediction Using Data Mining Techniques. 2nd International Conference on Advanced Science and Technology, ICASR 2019, Palladam, April 2019.
[3] Gers, F.A., Schmidhuber, J. and Cummins, F. (2000) Learning to Forget: Continual Prediction with Lstm. Neural Computation, 12, 2451-2471.
[4] Cui, Z., Ke, R. and Wang, Y. (2017) 0L2STM Recurrent Neural Network for Network-Wide Traffic Speed Prediction. 6th International Workshop on Urban Computing (UrbComp 2017), Halifax, August 2017, 1-11.
https://doi.org/10.48550/arXiv.1801.02143
[5] Ding, G.Y. and Qin, L.X. (2020) Study on the Prediction of Stock Price Based on the Associated Network Model of LSTM. International Journal of Machine Learning and Cybernetics, 11, 1307-1317.
https://doi.org/10.1007/s13042-019-01041-1
[6] Li, C. and Qian, G. (2023) Stock Price Prediction Using a Frequency Decomposition Based GRU Transformer Neural Network. Applied Sciences, 13, 222.
https://doi.org/10.3390/app13010222
[7] Dey, R. and Salem, F.M. (2017) Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks. 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, 6-9 August 2017, 1597-1600.
https://doi.org/10.1109/MWSCAS.2017.8053243
[8] Qiu, M., Song, Y. and Akagi, F. (2016) Application of Artificial Neural Network for the Prediction of Stack Market Returns: The Case of the Japanese Stock Market. Chaos, Solitons & Fractals, 85, 1-7.
https://doi.org/10.1016/j.chaos.2016.01.004
[9] Lee, M.C. (2022) Research on the Feasibility of Applying GRU and Attention Mechanism Combined with Technical Indicators in Stock Trading Strategies. Applied Sciences, 12, 1007.
https://doi.org/10.3390/app12031007
[10] Zhang, Y.J. and Tumibay, G.M. (2022) Stock Market Prediction Model Based on LSTM Deep Learning: The Case of Top Corporate Company in China. International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM), Hamburg, 7-9 October 2022, 451-455.
https://doi.org/10.1109/AIAM57466.2022.00092

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.