^{1}

^{2}

^{1}

^{1}

^{1}

The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered decoding) is a fixed number. In this paper, we study the circular-shifting network for decoding LDPC codes with arbitrary Z factor, especially for decoding large Z (Z P) codes, where P is the decoder parallelism. By buffering the P-length slices from the memory, and assembling the shifted slices in a fixed routine, the P-parallelism shift network can process Z-parallelism circular-shifting tasks. The implementation results show that the proposed network for arbitrary sized data shifting consumes only one times of additional resource cost compared to the traditional solution for only maximum P sized data shifting, and achieves significant saving on area and routing complexity.

Low-Density Parity-Check (LDPC) codes [

Since the high computational complexity in the QC-LDPC decoder, the high- speed decoder often needs to accelerate decoding by using Application Specific Integrated Circuit (ASIC), these accelerators are difficult to modify after tape- out, therefore it is necessary to design the LDPC decoder with high flexibility. The variable parameters of QC-LDPC are mainly the base matrix Hb and the expansion factor Z. It is not difficult to compatible with arbitrary Hb that can be configured by memory. However, different Z (Maximum memory conflict-free parallel decoding sublayers) will lead to the adjustment of parallelism of the decoder. The decoder with parallelism P can easily support decoding QC-LDPC codes with Z ≤ P. Therefore previous schemes often apply a Z-parallelism as the hardware structure [

The remainder of this paper is organized as follows. Section 2 introduces the QC-LDPC codes employed in IEEE 802.11n and IEEE 802.16e, the architecture of the LRSN is proposed, and the principle of the LRSN is described in this section. The implementation results of the LRSN applied in the QC-LDPC decoder are demonstrated in Section 3. Finally, the conclusion is drawn in Section 4.

For QC-LDPC codes, the sub-matrix size is defined by the expansion factor Z. The parity check matrix H of the QC-LDPC codes can be represented by the base matrix Hb_{mb×nb}, where mb and nb are the sub-matrix number in a column and row respectively. As shown in

QC-LDPC decoders can be implemented efficiently using the layered decoding algorithm [

decoding is performed with the routine of one non-zero sub-matrix by another and layer by layer. In each step, Z messages within a submatrix are processed in parallel. The proposed partially layered decoding flow is shown in the

24, and shift value r = 43, where the shadow parts are zeros. The LRSN receives P data, and outputs P shifted data with a valid flag. The LRSN is illustrated in detail in the next section.

(a) Memory arrangement. For the LRSN, the Z data will be arranged as follows. Let L = [Z/P], where L denotes the number of P groups, and [*] represents the ceiling function. Let D represents the input data, and mem (x, y) is the arranged datum, mem (x, y) denotes the y-th number in row x, and the data are arranged following (1).

where x

(b) Circular-shift network principles. The circular-shift network is responsible for circularly shifting the messages read from the memory into the correct order before they are processed by the processing units. The implementation of partially parallel LRSN is constrained by the following issues.

(1) Each group of output data may be derived from a number of memory units.

(2) A part of the output data from the shifter may output directly, whereas the others may need to be stored temporarily.

(3) Because the additional splicing and multiplexing logic are needed which consumes more propagation time, we enable the left-shifting and right-shifting networks working in parallel [

(4) Because in each clock P output data may originate from multiple memory units which cannot obtained in time, the no operation (NOP) clock needs to be inserted to pause the processing units. The experimental results show that the NOP clocks are less than 2 at most, so they are acceptable when Z is much larger than P.

The architecture of the left shift network is shown in

with five states is introduced to represent these cases, as shown in

(c) Detailed implementation. The memory accessing, shifting, and output multiplexing of LRSN are described as follows. Given that Z data are arranged in memory following (1).

(1) Calculate the memory access address which is defined as “addr”. Because the memory may be accessed by several times, the first group of data to be fetched is decided by the first output, which is calculated by addr <= [r/P], where [*] denotes the floor function. Then the “addr” updates following the Equation (2) in each clock, shown in

(2) Inside the LRSN, the shift value SV is calculated following (3).

Especially, the SV of right shifter is P when r is an integer multiple of P.

(3) The routine for out1, out2, out3 and out4 follow (4) (5) (6) (7), where

number “i” represents the index of the P data denoted by 0 to P − 1.

(4) The output data are always come from left shift network firstly and right shift network secondly. Therefore the four cases are transferred in a determined way shown in

The LCSM state transfers to S1 when the “start” is valid (a new sub-matrix is to be processed), and maintains at S1 state when addr < L − 1. The output may needs two input clocks of data, which are valid after one clock cycle, hence a NOP is issued firstly (output en = 0), then output selects Dout = out1.

The addr increases with clock. When reaching to the end of the current sub-matrix data, i.e., addr = L − 1, the output data are come from both the left shift and the right shift network, lasting for one clock. In this transition time, the case can be S2 or S3 in

In the S4 state, the addr increases with clock until the numbers of output “en” reaches L which means all the Z-sized data are processed, and finally LCSM returns to Idle.

(d) Representative example.

When t = 1, the data from addr = 2 are sent to the left shifter, and shift value is 19. Since addr = 2 and rtmp = 67 > 56, so the LCSM transfers to S2. When t = 2, the data from addr = 0 are sent to the right shifter, and shift value is mod (Z ? r, P) = 13, then Dout = out2, the LCSM transfers to S4. The shift operation is completed when the numbers of output “en” reaches 3. Finally, the LCSM returns to Idle.

We implemented the LRSN in a Xilinx Kintex7 xc7k325T FPGA chip. Reference [_{max} circular-shifting network to support all Z configurations less than Z_{max}, which has simpler routing decision and less complexity than the other Z_{max} networks. So, we re-imple- mented the shift networks with P = 24, 96, 128, 360 and 512, and compared them to the proposed LRSN in this paper. We compare them with Slice Registers, Slice LUTs, LUT Flip Flop and Z compatibility. For a fair comparison, the messages are quantized by 6 bits.

The results are shown in

We presented a fixed complexity highly compatible circular-shifting network called LRSN with time-sharing capability for partially layered LDPC decoder. The LRSN employs fixed routine and memory mapping to support partially parallel layered decoding of LDPC with any expansion factor. When the memory capacity is guaranteed, the LRSN can be used to decode LDPC codes with long block sizes, so the compatibility of the decoder is enhanced. In addition, the proposed LRSN can be applied for reducing the hardware cost by introducing a

much lower hardware parallelism, when the data rate are not quite high such as in DVB-S2 and DVB-T standards.

This work is supported by “National High Technology Research and Development Program of China” (Grant No. 2015AA01A709) and “National Science Foundation of China (NSFC)” (Grant No: 61471037).

Wang, Y.Z., Wu, Z.Z., Liu, P.P., Guan, N. and Wang H. (2017) A Highly Compatible Circular- Shifting Network for Partially Parallel QC-LDPC Decoder. Int. J. Communications, Network and System Sciences, 10, 24-34. https://doi.org/10.4236/ijcns.2017.105B003