Pipeline structure Schnorr-Euchner Sphere Decoding Algorithm

We propose a pipeline structure for Schnorr-Euchner sphere decoding algorithm in this article. It divides the search tree of the original algorithm into blocks and executes the search from block to block. When one block search of a signal is over, the part in the pipeline structure that processes this block search can load another signal and search. Several signals can be processed at the same time in one pipeline. Blocks are arranged to lower the whole complexity in the way that the previously search blocks are the blocks those have more probability to generate the final solution. Simulation experiment results show the average process delay can drop to the range from 48.77% to 60.18% in a 4-by-4 antenna system with 16QAM modulation, or from 30.31% to 61.59% in a 4-by-4 antenna system with 64QAM modulation.


Introduction
With the rapid development of wireless communications, the desire for high speed data service conflicts with the situation that fewer and fewer band width is available.Multiple-input multiple-output (MIMO), which can provide dramatic high spectral efficiency, is the key technology to resolve this conflict.MIMO is one of the most exciting technologies of the last decade [1,2].
The MIMO detection is rather a complex problem.The maximum likelihood (ML) detection provides the optimal performance.But its complexity is so high that it can not be used in actual systems.The linear detection, such as the zero forcing and minimum mean square error detection have low computation complexity.But their performance is very poor.The Vertical Bell Laboratories Layered Space Time (VBLAST) detection provides higher performance than the linear detection with the price of small complexity increase [3].But the performance is not good enough.
Sphere decoding (SD) algorithm was proposed to provide optimal performance or near optimal performance with low complexity [4].So it attracts much attention as soon as it was proposed [5].Schnorr-Euchner SD (SESD) algorithm is an optimized SD algorithm [6].It can find the optimal solution faster than the original SD algorithm and the modification from SD to SESD is very small.So it is the most popular optimal performance algorithm [7][8][9][10][11][12][13][14][15].However, there are two main drawbacks which restrict the usage of the SESD in actual systems, especially the real time systems.One is that the calculation complexity of the SESD is not fixed and changes in a large range.Another is that it is not suitable to use in pipeline structure.
In this article, we propose a pipeline structure for SESD algorithm.The searching process of SESD algorithm is divided into blocks and can be executed sequentially.When the multiple threads searching is executed simultaneously, the processing time decreases.Simulation experiment results show that the proposed structure can save remarkable processing time especially in low signal to noise ratio (SNR).

MIMO System and ML Detection
MIMO system can be depicted as an equation y = Hs + n (1) where s is the Nt×1 transmit vector.y is the Nr×1 receive vector, and n stands for the independent and identical distributed complex zero-mean Gaussian noise.H is the Nt×Nr channel matrix where hij is the complex channel gain and satisfies where  means the set of all possible transmit vectors.

Tree Search and Euclidean Distance
For further disposition, the QR decomposition is always used to make the channel matrix triangular.We know that every matrix can be express as equation ( 1) will be expressed as where 4) can be rewritten as The ML algorithm can be expressed as where is the estimated value of transmitted symbol.For sake of simplification, we set N=Nt next.
From equation ( 5), it can be found that there is no inference from other transmission signals on N s .It can be expressed as According to equation (7), N s can be calculated by where ˆN s is the estimated transmit signal.When N s is known, 1 N s  can be calculated by eliminated the interference of N s .According to this idea, all transmit signals can be calculated by Equation ( 9) means the i th receive signal is determined by the transmit signals from the i th to the N th one.So the solution of equation ( 6) can be searched from the N th signal to the i th signal as layer by layer in a tree.That is the principle of VBLAST algorithm.Each possible transmit signal s i is called a node in the searching tree.The solution finding can be depicted as searching in the tree from the root node to a certain leaf node.
VBLAST only searches the tree by one route.If there is an error in one layer, signals in next layers will be error in high probability.That is why the performance of VBLAST algorithm is not good enough.ML algorithm searches all the leaf nodes in the tree and selects the node which satisfies equation ( 6) from all the possible nodes.
Obviously it has the optimal performance, but is a NPhard problem.
To find the best signal as in equation ( 6), the distance between transmit and receive symbol should be calculated.According to formula (5), the Euclidean distance (ED) can be defined as Also define the partial Euclidean distance (PED) as

Schnorr-Euchner SD algorithm
Despite the optimal performance, the number of visited nodes of ML algorithm is the exponential function of Nt and m, where m is the signal modulation order.That is a very large number, so the complexity is too high to be utilized.SD algorithm reduces the number of visited nodes by setting a hypersphere before search and only nodes in this hypersphere will be searched.When a new leaf node is found in the hypersphere, the radius of the hypersphere is shrunk to the distance of this node.The complexity of the SD is a small fraction of that of ML.SESD is the most important SD algorithm.It optimizes the SD algorithm by searching the smallest child node of the parent node each layer first.The first found solution is often the solution good enough.As a result, some nodes visiting can be avoided and the calculation complexity can be further lowered.

Pipeline Structure SESD Algorithm
The SESD algorithm must search the whole tree to get the solution.Each visited node must compare with the global smallest radius to determine whether or not it should be kept.Where the SESD algorithm is used, symbols should be detected one by one.Another detection can only begin after one is over.As a result, the average processing time can not drop easily.
In this article, we proposed that the whole search tree can be divided into blocks and the search process can be executed from block to block to form a pipeline structure, like Figure 1.Each block is one part in the pipeline.The number of block equals to the number of part in the pipeline.The whole search begins from one certain block.The top node of blocks is called the sub-root node of this block, from which the block search starts.Unlike the tree root node which has no practical significance, the sub-root node has its counterpart transmit signal and its PED.
There is a solution for each part, which is called the sub-solution of this part.The radius used in the SESD is still the threshold to cut node in each part.All parts share one radius although it should be updated anytime when the update condition is satisfied.The radius is called globe radius in our algorithm.Each sub-solution will update the globe radius, which will act as the cutting radius in the remaining parts.
Because of the node cutting, there may be no sub-solution for some parts.This is not so much a bad news as a good news because that means all nodes in this part are cut and the visited nodes number in this part is often small.When a sub-solution in one part is found, the globe radius will update to the ED of this sub-solution.
Each part executes independently except one input parameter and two output parameters.One input parameter is the cutting radius, and the output parameters are the new radius and the sub-solution of this part.So it can be processed in a pipeline structure.Each part can process at the same time.If the process of one part is over, it passes the new radius and sub-solution to the next part and begins a new process.There are several processes for decoding in one pipeline at the same time.The average processing time for one detection will drop.
The structure of the proposed algorithm can be depicted as Figure 2(a).Each detection is divided into M parts.One signal begins its step K th detection at the time when the previous signal ends its step K th detection and begins its step (K+1) th detection.So in a pipeline with M parts, there are M signals detection at the same time, like Figure 2(b).
As mentioned before, if the previously processed block generates a radius small enough, most nodes in following blocks will be cut quickly.The number of visited nodes will be small and the calculation complexity will be low.So we can deduce that the sequence of block detection should be arranged in a way that the first detected block should be the block that will most likely generate the smallest sub-solution for the whole tree, which is the final solution.So the sequence of block detection should be arranged in the rising order of the PED of the sub-root, that is, just the sequence of the SESD algorithm.

Simulation Results
On the basis of section 5, we show the complexity of our algorithm and the SESD algorithm.The performance of the proposed algorithm is optimal as the performance of SESD, so that it is not necessary to be shown.The complexity is represented by the visited node number as in [16].
The complexity of the proposed algorithm is compared with the SESD algorithm in different SNRs.Also the visited node numbers of every part are listed.We convert the calculation into real field instead of complex field in both figures.But all results are the same as in complex field.
Figure 3 shows the complexity of a 4-by-4 antenna system with 16QAM modulation.Figure 4 shows the complexity of a 4-by-4 antenna system with 64QAM modulation.
From both figures, it can be found that the sum of the visited node number of all parts almost equals to that of the SESD algorithm, which means that the proposed algorithm does not increase the calculation complexity.The average process time is determined by the part that has the largest visited node number, which is the first part.Specially, in 16QAM system of Figure 3, the average process time is about 48.77%, 54.34% or 60.18% of that of the SESD algorithm when SNR=0, 10 or 20dB.In   64QAM system of Figure 4, the average process time is about 30.31%, 52.37% or 61.59% of that of the SESD algorithm when SNR=0, 10 or 20 dB.From these data, we may find that the proportion of the visited node number of the first part increases with SNR.That is because that in high SNR the probability of finding the final solution in the first part is high.When the final solution is found, most nodes in later parts will be cut quickly.As a result, the visited nodes number in later parts is small and the proportion of the first part visited node number is high.This is also indicated that the early visited part should be the part that has high probability to generate the final solution.

Conculsions
A novel algorithm is proposed, which divides the search tree of the SESD into blocks so that calculation can process from block to block as a pipeline structure.Several signals can be searched at the same time in one pipe-line structure.The average processing delay drops to the range from 48.77% to 60.18% in a 4-by-4 antenna system with 16QAM modulation, or from 30.31% to 61.59% in a 4-by-4 antenna system with 64QAM modulation.