Multi-Scale and Multi-Channel Networks for CSI Feedback in Massive MIMO System

In the frequency division duplex (FDD) mode of the massive MIMO system, the system needs to perform coding through channel state information (CSI) to obtain performance gains. However, the number of antennas of the base station has been greatly increased, resulting in a rapid increase in the over-head for the user terminal to feedback CSI to the base station. In this article, we propose a method based on multi-task CNN to achieve compression and reconstruction of channel state information through a multi-scale and mul-ti-channel convolutional neural network. We also introduce a dynamic learning rate model to improve the accuracy of channel state information reconstruction. The simulation results show that compared with the original CsiNet and other work, the proposed CSI feedback network has better reconstruction performance.


Introduction
Massive multiple-input multiple-output (MIMO) technology was proposed in the 20th century and has become particularly important in the latest 5G wireless communication systems [1]. Massive MIMO system refers to the simultaneous provision of services to multiple users through base stations equipped with a large number of transmitting antennas on the same time-frequency resources [2]. In MIMO systems, increasing the number of antennas can improve channel capacity and transmission efficiency. However, the above benefits are based on the base station's ability to obtain accurate channel state information (CSI). In the uplink, the base station can accurately estimates the channel state information through the pilots sent by the user equipment [3]. For the downlink, the us-er equipment needs to estimate the CSI and feed it back to the base station for precoding. However, due to the increase in the number of antennas, the size of the CSI matrix is also greatly increased [4]. If the traditional feedback method is used, the cost of the system will increase.
With the rise of deep learning, great progress has been made in computer vision, natural language processing, etc. Deep learning technology was introduced into the field of wireless communication. It is used to compress and reconstruct the CSI matrix. An auto-encoder network called CsiNet proposed by Wen and et al. [5] compresses and reconstructs CSI through a convolutional neural network, whose reconstruction performance is better than that of traditional compressed sensing methods (e.g. LASSO and TVAL3) [6] [7] [8]. In addition, they also proposed LSTM-CsiNet, a recurrent neural network that uses channel time correlation and has good reconstruction performance in the face of high compression ratios [9]. They also proposed a multi-rate CSI network in [10], which reduces the amount of parameters and a novel quantitative CSI feedback network is adopted. Based on the channel reciprocity, the uplink CSI information is used to reconstruct the downlink CSI. The work of literature [11] [12] reduces the impact of transmission delay in feedback. The above researches roughly show that the convolutional neural network has a better effect on CSI feedback processing.
However, in these methods, the number of parameters is huge and computational complexity is relatively high. So we consider to use convolutional neural network as the infrastructure and use multi-scale and multi-channel convolutional neural network to improve the quality of CSI reconstruction. In addition, we introduce the dynamic learning rate model to obtain optimization. Our main work is listed below.  We propose a new CSI compression recovery mechanism in FDD massive MIMO systems, which is called multi-scale and multi-channel convolutional CsiNet (MSMCNet). In this multi-task network, we set the number of the channels to 3 because of the balance of complexity and accuracy.  We introduce a dynamic learning rate model to improve the robustness of the automatic encoder especially in case of high compression ratio. At the same time, we adopted block convolution and hole convolution to the convolution layer which can improve robustness in case of different sparsity matrix.
The rest of the article is arranged as follows. The Section 2 introduces the CSI feedback system model; the Section 3 presents the specific architecture of MSMCNet; the Section 4 shows the simulation results and analysis. Finally, the conclusion is drawn in Section 5.

System Model
We consider a simple single-cell downlink massive MIMO system.
, n x ∈  , n z ∈  respectively represent the channel vector, precoding vector, data bearing symbol and additive noise of the nth subcarrier. The downlink CSI matrix is a stack of subcarrier channel vectors [13]: After that, two pre-processes are implemented on CSI: 1) Using two dimension (2D) discrete Fourier transform (DFT) to transform

Design of MSMCNet
In the literature [5], CsiNet adopts the residual structure of RefineNet which has been proved to be effective in calculating CSI. CsiNet applys a fixed resolution which has a fixed convolution kernel to extract the features of the CSI matrix. However, the degree of CSI matrix sparsity shows different adaptability to different resolutions. For example, if a CSI matrix has poorer sparsity, we should use a convolution kernel with a smaller kernel size to extract finer features. However, when the sparsity of the CSI matrix is very high, if we continue to use a smaller kernel size, it may result in a large blank area and cannot effectively extract its features. Therefore, in case of different CSI matrices, the size of the convolution kernel adopted should be different in order to adapt to different sparsity.
We introduce a multi-scale and multi-channel convolution kernel and propose a network called multi-scale multi-channel-Net (MSMCNet). The structure of MSMCNet is shown in Figure 1 and Figure 2. MSMCNet consists of two parts: the encoder at the user and the decoder at the base station. The a H is 2 c t N N × × , where 2 represents the real part and the imaginary part of the matrix. We set three channels which is a trade-off between complexity and results.
Firstly, for the encoder, the input image passes through three parallel channels. The size of the convolution kernel of each channel is 3 × 3, but the three channels use different degrees of hole convolution, and the dilation is 1, 2 and 3. The dilation 1 represents the ordinary 3 × 3 resolution, and the 2 corresponds to the 5 × 5, and the 3 corresponds to the 7 × 7 convolution kernels. Therefore, convolution kernels of different scales are concatenated to the outputs and  merged them through a 1 × 1 convolution layer. Then we adopt the fully connected layer to obtain the desired compression rate.
For the decoder, firstly, it amplifies the received feature vector V to a specific size and roughly extract features through a 3 × 3, dilation = 2 convolution kernel. After that, the output characteristics are obtained through two MSMCBlock modules and finally the CSI matrix is reconstructed through the Sigmoid layer.
As shown in Figure 2, MSMCBlock has three parallel channels, one passes through 1 × 9 and 9 × 1 convolution kernels, and the second passes directly through 1 × 5 and 5 × 1 convolution kernels, and the third passes through 1 × 7 and 7 × 1 convolution kernels. The results are concatenated through 1 × 1 convolution layer. In this way, it can perform well in the case of different sparsity CSI matrices. MSMCBlock also directly merges the original data to the subsequent layers and adds the original data and the multi-channel convolution results. Multichannel refers that three channels we used and multiscale refers that different convolution kernels.
We use hole convolution and 1 × 9, 9 × 1 serial convolution to replace the huge convolution kernel of 9 × 9. They can retain the resolution effect of 9 × 9 in the resolution area and reduce the complexity of the calculation and the number of parameters. The MSMCNet proposed in this paper has higher accuracy than CsiNet.
For each convolutional layer, the activation function LeakRelu is adopted instead of Relu, because the negative part of Relu has a slope of 0. When a large gradient flow passes through the Relu neuron, this neuron will be assigned a value of 0 after the parameters are updated. This neuron will not activate the subsequent data. The LeakRelu has a negative slope, and the above problem will not occur. And when we increase the slope appropriately, the performance of MSMCNet will be improved.
In addition, we also use a dynamic learning rate learning scheme. If we use 0.001 as the value of learning rate in 1000 learning epochs, we can indeed get good results. If we want to increase the efficiency of deep learning, we set the learning rate in the initial state to be high so that they can quickly enter the correct range. After that, the learning rate is reduced to ensure that over-fitting does not occur and it can get better results. The learning rate used follows a cosine function and the network performance can be improved through this dynamic learning method. Under the reference of literature [2] [5] and tested in advance, the maximum learning rate we set is 0.0025 and the minimum learning rate is 0.0005. The initial value of function is maximum and the last value is minimum, the whole process is monotonously decreasing. Such a learning rate setting can not only accelerate the learning speed of the first half of the network, but also prevent overfitting in the second half and further approach the true value.

Simulation Results and Analysis
The experiment is completed in an indoor scene at 5.3 GHz band. We use the Adam optimizer with mean square error (MSE) as the loss function. We also use a dynamic learning rate in the system, the maximum learning rate is 0.0025 and the minimum learning rate is 0.0005. In order to evaluate the performance of MSMCNet, we use the normalized Journal of Computer and Communications mean square error (NMSE) to measure the distance between the original matrix a H and the reconstructed matrix a H . The training epochs chosen in our test is 1000. During the test, we find that the training results usually get better with the growth of the epochs. However, when the epochs increase to 5000, the improvement will be very slight. So we use 1000 epochs in our experiment. In [5] CsiNet is better than traditional compressed sensing methods. From Table 1 that in case of 4 compression ratios, MSMCNet with dynamic learning rate is better than const-MSMC and both they are better than CsiNet. Figure 3 shows that whether the dynamic learning rate is used, the MSMCNet is better than the CsiNet and its loss is lower. When we compare the red and green curves in Figure 3, it is obvious that the loss of the red is lower, which shows that the usage of dynamic learning rate is obviously helpful for results. Figure 4 shows the changes of NMSE under the three network frameworks. The data is recorded every 10 epochs and there are a total of 100 data in 1000 epochs. MSMCNet in the article has a better reconstruction rate with both dynamic and non-dynamic learning rate. Since the dynamic learning rate in the middle period is higher than the fixed learning rate, the red line in Figure 4 fluctuates more greatly. At the end of the period, the red curve becomes more stable compared to the green. The final result shows that the dynamic learning rate has a better NMSE.
From Figure 5, we can see that MSMCNet's effect on outdoor channel matrix restoration is far from ideal and far lower than the indoor effect. When experiment at a high compression ratio, the reconstruction efficiency of the channel matrix will also decrease. So the further task is how to obtain a more ideal NMSE with high compression ratio and outdoor conditions.    We find that the CsiNet network almost reached the optimal value when it was trained for 100 epochs and the subsequent 900 epochs did not improve the results. MSMCNet can be significantly improved because it is more capable of exploiting subtle changes among adjacent elements than CsiNet methods.

Conclusion
For the downlink CSI feedback in the massive MIMO FDD system, we proposed MSMCNet on the basis of CsiNet. The multi-channel and multi-scale convolution was introduced into the CSI feedback task and it is proved to be effective. At the same time, the concept of dynamic learning rate was adopted to further improve the efficiency of CSI reconstruction. Experiments have shown that our scheme has higher reconstruction efficiency than CsiNet. But experiments also presented that deep learning methods have poor reconstruction efficiency in case of outdoor environments and high compression ratios. We hope this paper will encourage future research in this direction.