Design and FPGA-Implementation of Minimum PED Based K-Best Algorithm in MIMO Detector

Minimum Partial Euclidean Distance (MPED) based K-best algorithm is proposed to detect the best signal for MIMO (Multiple Input Multiple Output) detector. It is based on Breadth-first search method. The proposed algorithm is independent of the number of transmitting/receiving antennas and constellation size. It provides a high throughput and reduced Bit Error Rate (BER) with the performance close to Maximum Likelihood Detection (MLD) method. The main innovations are the nodes that are expanded and visited based on MPED algorithm and it keeps track of finally selecting the best candidates at each cycle. It allows its complexity to scale linearly with the modulation order. Using Quadrature Amplitude Modulation (QAM) the complex domain input signals are modulated and are converted into wavelet packets and these packets are transmitted using Additive White Gaussian Noise (AWGN) channel. Then from the number of received signals the best signal is detected using MPED based K-best algorithm. It provides the exact best node solution with reduced complexity. The pipelined VLSI architecture is the best suited for implementation because the expansion and sorting cores are data driven. The proposed method is implemented targeting Xilinx Virtex 5 device for a 4 × 4, 64-QAM system and it achieves throughput of 1.1 Gbps. The results of resource utilization are tabulated and compared with the existing algorithms.


Introduction
Today MIMO system is one of the wireless communication technologies which provide increased data throughput and link range without any additional bandwidth.MIMO plays a key role in every new wireless standard, such as HSDPA (High Speed Download Packet Access), IEEE 802.11n [1], IEEE 802.16e and 3GPP-LTE.At receivers, MIMO signal detection plays an important role in meeting the tough requirements of real-time processing.The design of low complexity, high performance and high throughput receivers are the key challenges in the design of any MIMO receiver.Several MIMO detection algorithms have been proposed to address this challenge, which offers various tradeoffs between the performance and the computational complexity.
Among the large variety of the MIMO detection techniques Maximum Likelihood (ML) detection [2] provides the optimum solution with minimum BER but the computational complexity of full search grows exponentially as the increase in the constellation size or the number of transmitting and receiving antennas.On the other hand, linear detection algorithms such as the Successive Interference Cancellation (SIC) detectors [2] or Minimum Mean-Square Error (MMSE) detection [3] and Zero-Forcing (ZF) can greatly reduce the computational complexity but at the same time they have reduced performance.
Finally to solve the trade-off between complexity and performance loss tree search algorithms are introduced.Depth-first and Breadth-first search algorithms are two main categories in tree search algorithms.In Depth-first search algorithm the tree is traversed in both forward and backward direction with variable throughput which results in extra overhead in hardware.But in Breadth-first search algorithm [4] the tree is traversed only in the forward direction with fixed throughput.A well-known approach in Breadth-first search method is the K-best algorithm [4].The K-best algorithm guarantees a Signal to Noise Ratio (SNR)-independent fixed-throughput with a performance close to ML.In each cycle, K (Parent node) × M (Child node) children should be enumerated, which results in large computation complexity in K-best algorithm.Here the K values are selected randomly.To increase the value of K, the performance result becomes close to the ML detection.However, a higher K value results in more hardware complexity.Thus K = 10 is close to ML while having moderate complexity for a 4 × 4, 64-QAM MIMO system [5].
In this paper, to reduce the computational complexity in K-best algorithm, a MPED based K-best algorithm is designed and implemented on FPGA board (a reduced complexity systems), which making a significant reduction in the over-all hardware/software complexity of the system.The K value is chosen based on the square root of constellation order of QAM.So for a 4 × 4, 64-QAM MIMO detector with K = 8 is chosen.All the results presented on system performance were first tested in Matlab and then translated into hardware blocks in Simulink using Xilinx System Generator (SysGen).Once the hardware designs were completed, the bit streams were generated using Xilinx synthesis tools, which were required for the FPGA implementation.An efficient VLSI implementation is the key to enable real-time wireless communication.

MIMO System
Assume the MIMO system with N t transmitting and N r receiving antennas as shown in Figure 1.The equivalent baseband signal between the transmitting antenna and receiving antenna for the AWGN channel is stated in a complex-valued N t × N r channel matrix H [6].
The complex-valued base band received signal is expressed as, where , , ,  in which N t -dimensional complex transmitting signal vector, from which each element is obtained independently from the complex constellation of QAM. with the variance of σ 2 per dimension.A complex domain frame work is developed here; and on the other hand for the received signal, the real value decomposition can also be derived [7].Due to the intrinsic challenges in the implementation in the complex domain, most of the MIMO detection algorithms in the literature have been proposed for the real domain [5].On account of the deeper search tree, the real domain implementation results in a larger silicon area and a larger latency.However, in the complex domain a high throughput MIMO detector with an acceptable complexity for the high-order constellations has always been a challenge in the literature.To address this challenge, a high throughput detection algorithm along for a 4 × 4, 64-QAM complex MIMO detector with its VLSI architecture is proposed, which is scalable to higher order constellation schemes (256-QAM) and for larger number of transmitting and receiving antennas.

K-Best Algorithm
In K-best algorithm each level of the tree is expanded from root to the leaves and selects the best candidates with the lowest path metric that is possible in each level.The path at the last level of the tree with the lowest Partial Euclidean Distance (PED) is the hard-decision [8] output of the detector, whereas, all of the existing paths at the last level of the tree are considered to calculate shortest path node is the soft-decision [8] output of the detector.The size of the tree search grows considerably when the constellation size is increased one.Therefore, at each level of the tree an enhanced way is needed to calculate the K-best candidates without performing an exhaustive search.
The objective of the MIMO detection method is to find the closest lattice points [9] for a given received signal 2 arg min where O is the set of vectors from the real entries in the constellation.
The channel matrix H is QR decomposed as H = QR, where Q is the unitary N r × N t matrix and R is the upper triangular N t × N t matrix.By taking hermetian of Q or (Q H ), the nulling operation can be performed, which results in Z = Q H × Y, which in turn equals to Rx + w, where w = Q H + e, the nulling matrix is always known to be one, where the noise w after nulling remains spatially white.Since R is an upper triangular matrix in nature, hence the Equation (2) can be represented as the Equation ( 3) is considered as an tree-search problem with N t levels, where, starting from the last row, one symbol is detected and, based on that, the next symbol in the upper row is detected, and so on.The two computing procedures in the K-best algorithms are 1) Expansion: The K-best algorithm in the complex domain can be expressed as K (Parents of each level) ×M (Children per parent).KM children should be enumerated, which results in higher computational complexity.The relaxed K-best algorithm and base-centric search methodology [10] based on Quadrature Phase Shift Keying (QPSK) modulation [11] are compared [9].These modulation schemes do not scale linearly with the constellation size and the performance loss that occurs in these schemes is compared to the K-best algorithm.In the on demand expansion scheme the nodes are expanded by considering all the nodes with PED, which in turn reduces the performance for higher order constellations.
2) Sorting: In the K-best algorithm, for each level of the complex-domain, KM children should be sorted.In [12] and [13], most of the sorting schemes such as bubble sorting [14], which is sorting mechanism on the basis of Schnorr-Euchner (SE) technique [15], and a distributed sorting scheme are compared.But these techniques take high time for large values of the K and M or it will have a performance loss.
To overcome the above two challenges the MPED based K-best algorithm is proposed, in which the node with the minimum PED is considered as the parent node at each level.The computational complexity and performance will be better than the on-demand expansion scheme and works well for any values of K and M without any performance loss.

Proposed MPED Based K-Best Algorithm
The proposed MPED based K-best algorithm is based on the Breadth-first tree search method.The algorithm is initialized by considering the level l of the trees and assumes the candidate nodes in the level l + 1 is known in the tree.The individual nodes in the level K l+1 will be having M possible children's and the K value or number of cycles will also be based on the square root of constellation order, so there will be K M × possible children in the tree.
The main objective of the proposed scheme is to find the First Best Child (FBC) of the initial parent node, based on the Minimum PED of the received first M children.Assuming that initial parent node is non-numerical value.In other words, the key innovation behind the proposed MPED based K-best algorithm is to find the FBC of each initial parent node in the level K l+1 ,and among these children the best candidate at level K l+1 , is the one which is having the minimum PED value.The best candidate selected act as a parent node for the next level.The children's for the second level parents are generated and it replaces the first level siblings.In order to find the best path the process is repeated K times.For each level of tree the same procedure is repeated till the best path is found.
The proposed MPED based K-best algorithm scheme is diagrammatically represented in Figure 2 for level l which includes the modulation order M = 64, so the total number of levels are given by 8 M = , and the K = 8.It shows the way of deriving the K l from K l+1 level.The input to the algorithm is initially applied with zero PED value, the parent node at the level K l+1 has four children's, the corresponding PED values of the four children's are shown in Figure 2.
Here the parent can find its own children's without visiting all the nodes in the tree.Let the representation of S l consist of best selected child for the first parent, and let P T represents the corresponding PED values (in Figure 2, { } , where S ij represents the j th child of the i th parent node in the first level of the algorithm).From Figure 2, it is noted that the child with lowest PED value is certainly the best child selected at the level 1. Similarly the above steps are followed for all the levels.Finally the best child is obtained based on the MPED based K-best algorithm.
The proposed scheme involves the following features: 1) It can be easily adapted to real domain.
2) Based on the QAM constellation size the K value is chosen in proposed scheme so as compared to the existing algorithm (K value is randomly chosen) it has less computational complexity.3) It can be applied to infinite lattices and be jointly applied with lattice reduction.4) Increased performance is obtained by using Wavelet Packet Transformation (WPT) with the AWGN channel.5) It has reduced BER.6) Easily implemented in VLSI architecture.

Proposed VLSI Architecture
In VLSI architecture one of the main key challenges is to achieve high throughput with minimum number of levels that are being used in the architecture.To address this challenge, a pipelined structure is used, which performs the child expansion and minimization in a pipelined fashion and the sorting is implemented in a distributed way.The pipelined architecture involves the sorter block which sorts all the signals and the Processing Element (PE) block generates the best signal from the sorted signals.
The proposed pipelined VLSI architecture for a 4 × 4, 64-QAM hard output MIMO detector is shown in Figure 3.Each layer gets the entries of z i , r ij and the K parents [16] of the previous layer as inputs and generates the K parents of the next layer as outputs.The proposed architecture consists of eight layers (2N t = 8 stages), from L1 to L8, corresponding to the 8-level detection tree.
From the MPED based K-best algorithm, the best signal is detected and this signal is taken as an input to the 8th level of the tree, which opens up all the possible values in O = {−3, −1, 1, 3}and calculates their corresponding PED [9].The output of this stage is resulting in PED values, which is performed by Level I and for each of the nodes, the First Child (FC) is found and its PED value is updated using Level II.

Sorter Block
Using the Sorter block the FC is sorted and from that the child with lowest PED is determined.This is represented y Figure 3, which includes four clock cycles and all eight resulting PEDs is sorted out.The number of clock cycles required for sorting is partial than the classic bubble sort in [14].The key idea that makes this sorter faster is the implementation of two tasks in one clock cycle through the introduction of intermediate registers.The output from the sorter block is loaded simultaneously to the PE I block.

PE I Block
The PE block contains a data register file and three computation units: an arithmetic/logic unit, a multiplier and a shifter.The PE I block takes the FC of each level as an input and generates the K-best candidate of that level one-by-one.The node with the lowest PED is definitely one of the K-best candidates in L7.This value is passed to the PE II block in L6.By removing the first child, its next sibling is calculated by the PE I block.The PED of this sibling is compared with other FCs, already present in that stage.The next K-best candidate with the lowest PED among this new set were found.This process is repeated 8 times (taking 8 cycles) until all the K-best values of the second level of the tree are generated and passed to the PE II block.

PE II Block
The PE II block receives the K-best candidates of L7, one after the other, and generates the FC of each received K-best candidate one-by-one and sorts them as they arrive.It finally transfers them to its following PE I block.This process repeats for all the levels down to the first level.Since at the first level only the FC with the lowest PED is of concern, whose solution S is the hard-decision output of the detector.

Simulation Results
A 4 × 4 64-QAM MIMO system with K = 8 is considered in our simulation.The simulation is carried out using Matlab.The input message signal is chosen and it can be plotted in the random bit form.To apply the proposed method for the input message signal, the best candidates are identified for each cycle.

Best Candidates for the Proposed Detection Algorithm
The best candidates for the proposed MPED based K-best algorithm obtained as a result of simulation for the given input message stream are listed below.

BER Performance Analysis for Proposed Algorithm with Rayleigh and AWGN Channel
Based on BER vs. SNR the simulation results of the MIMO detections are presented in this section.BER is a key parameter that is used in assessing systems that transmit digital data from one location to another.It is defined as, BER number of errors total number of bits sent = If the medium between the transmitter and receiver is good and the signal to noise ratio is high, then the bit error rate will be very small possibly insignificant and having no noticeable effect on the overall system However if noise can be detected, then there is chance that the bit error rate will need to be considered.The BER is compared with the Rayleigh fading channel as well as AWGN fading channel scheme as shown in Figure 4, with the increased SNR value from 0 to 20 dB.The analysis shows BER is less in AWGN channel with the proposed algorithm than the Rayleigh channel.mented using a feed forward architecture.According to the proposed algorithm, K-best candidates of each layer of the architecture are generated in Kclock cycles, which increase the throughput of the system.
The throughput of the system is the number of packets produced per unit time.This is measured in units of whatever is being produced (I/O samples, memory words, iterations) per unit time.The latency is the number of cycles required for the system to accept next input and the gate count involves the total core area of the design.
The Normalized Hardware Efficiency (NHE) is calculated, which is given by the gate count and the corresponding scaled throughput [5] in the same technology for all designs.So Moreover, the proposed scheme is implemented in the FPGA platform.The synthesis results and the required resources for the 4 × 4, 64-QAM MIMO detector using the proposed scheme is shown in Table 1.

Conclusion
To detect the best signal for high performance MIMO detector, a MPED based K-best algorithm has been pro-

Figure 2 .
Figure 2. The proposed MPED based K-best algorithm for √M=8 and K=8 and simulated PED values.

Figure 3 .
Figure 3. Proposed Pipelined VLSI architecture of the MPED based K-best algorithm for the detection of a 4 × 4, 64-QAM system with K = 8.

Figure 4 .
Figure 4. BER comparisons for proposed algorithm with Rayleigh and AWGN channel.

Figure 5 .
Figure 5. Analysis of MPED based K-best algorithm with various detection algorithm.

Figure 6 .
Figure 6.BER comparison for FFT and Wavelet scheme.
1) The best candidate of 1 cycle is 3.283351e+003.2) The best candidate of 2 cycle is −1.083299e+002.3) The best candidate of 3 cycle is 3.279795e+003.4) The best candidate of 4 cycle is −2.166598e+002.5) The best candidate of 5 cycle is 3.393075e+003.6) The best candidate of 6 cycle is −3.249897e+002.7) The best candidate of 7 cycle is 3.389520e+003.8) The best candidate of 8 cycle is −4.333195e+002.