Peak Detection Implementation for Real-Time Signal Analysis Based on FPGA

In this paper a real-time peak detection method based on modified Automatic Multiscale Field Detection (AMPD) algorithm and Field Programmable Gate Arrays (FPGA) technologies of a time series data is studied, and optimum scaling is highlighted after testing several scales. To validate the results obtained from modified algorithm, they are compared with the results of original AMPD method. As data of this study, three-phase voltage values of a power station are used. A detail detective sensitivity analysis of phase-to-phase voltage values is tried at different scales. Moreover, the original algorithm is tested regarding the off-line mode to obtain optimum scaling for real-time peak point detection. It is concluded that the peak detection of minimum and maximum points of data series achieved by modified algorithm is very close to the results of original AMPD algorithm.


Introduction
Peak detection of any time series data is always a hot topic in many engineering fields including chemistry, biology, biomedical, optics, astrophysics and energy systems.So these fields often require real-time peak detection.As the environment noises can affect the signals somehow, a robust peak detection, in this case, is a challenging topic.To obtain a successful peak detection method, several methods have been proposed, including automatic multiscale-based peak detection [1], window-threshold techniques [2] [3] [4], wavelet transform [5]- [11], techniques using entropy [12], and artificial neural networks [13] [14].Particu-larly, each method was investigated in terms of the detection method employed and the detection performance achieved.Drawbacks of the peak detection algorithms available in the literature are that many free parameters such as the window length of a threshold value have to be used in order to apply the algorithm to the signal, and to make the algorithm applicable.Generally, the algorithms with fewer parameters are restricted for use in specific applications like the detection of R-peaks in electroencephalography (ECG) signals and to obtain an adaptive and time-efficient R-peak detection algorithm for ECG processing as well as reduce the size and noise of ECG signals [15]- [20].In addition, noise in analyzed signal is a challenge for many peak detection algorithms.
On the other hand, periodic and quasi-periodic signals are the most difficult ones to detect the peak points.However, AMPD method is suitable for all types of peak point detecting.Thus, the automatic multiscale-based peak detection (AMPD) method [1] has been introduced as an effective method.In this research, we used the off-line and online terms to show the scale.Off-line algorithm means to fix the scale, then analysis all input data with using stable scale for each clock.On the other hand, online algorithm means to vary the scale for each clock.However, these software methods are not suited for real-time processing but emphasized off-line sophisticated data analysis.
The use of FPGAs provides a promising approach to real-time peak analysis [21] [22].FPGAs do not run a program stored in the program memory because they are reprogrammable chips and include a lot of logic gates, which are internally connected to form a complex digital circuitry.FPGAs are not processors and are entirely different from CPUs, GPUs, and DSP.However, they offer various opportunities for efficient real-time signal processing by making the best use of pipelined structure in computing.Furthermore, in previous work, we applied AMPD method on an FPGA as off-line by changing its bite size and scales and then analyzed it in terms of speed, cost and memory [23].Eventually, we realized that changes in bit size did not affect the peak detection.This paper introduces a novel approach for robust and real-time peak detection by using the AMPD algorithm and the FPGA technology.It highlights the modification of the original AMPD algorithm to be an off-line method, and how it can be implemented on an FPGA so that a pipelined structure in computing is extracted on hardware.Thus, the optimum scaling for the off-line peak detection is obtained, and results compared with the original AMPD method are found very promising.number of compared data at a time.AMPD is divided into 4 different stages: Local Maxima Scalogram (LMS) calculation, Row-Wise summation of the LMS, LMS rescaling, and peak detection.Figure 1 shows the flowchart about the original algorithm.However, to enable off-line processing on an FPGA, Row-Wise summations of the LMS and LMS rescaling are skipped to obtain optimal scale decision and LMS rescaling, which is possible to perform in advance in a calibration phase.Number of combinations to create scale pattern like 33, 65, 129, 257, 513 and 1025 for make a simple and efficient pipeline to implement on FPGA.Fixed scales are then chosen and, based on them, the sensitivities of the peak points are analyzed.

LMS Calculation Essentially, LMS calculation means analysis of all input values
( , , , , , n x x x x x x =  ) by using the moving window approach to fill in the Z matrix given in Equation ( 1) below that also indicates the size of the Z matrix.In Z matrix, k denotes the number of rows while i denotes the columns.The relation between n and L can be defined as shown in Equation ( 1): Figure 1.AMPD algorithm flowchart.When all numbers denoted by x i are analyzed, the previous value should be (X i-1 ) and the next value (X i+1 ) is checked and compared using the window approach, which is called distance between all the points.Furthermore, the window approach scale depends on the L. where; n denotes total column number L denotes total row number ( ) At the same time k is changed from 1 to L in Figure 2, which shows the comparison mechanism of input values.
Elements of Z matrix can be calculated by Equation (3).
If this condition point ( i x ) is provided, Then, new diagonal element z L,n value assigned to , or else, a random number (r) assigned to , k i z is generated at every time by a random generator.The range of random numbers is between 1 < r < 2. In this way, the whole Z matrix is obtained by analyzing each element of it.

Positive Edge and Negative Edge Peak Points Detection
The target of this study is to detect peak points by applying the variance formula.
After completing the formation of the Z matrix, the zero points are detected by applying the variance formula to each column.In some cases this value may not be zero due to noises; however, detection is done when the value is smaller than a minimal threshold value.Thus, if sigma (σ), in Equation ( 4) is found to be zero, then this point is interpreted to correspond to the peak value.If sigma is different from zero, this point does not correspond to the peak value, where, for 4) should be denoted as L so that all zero points of Z matrix given in Equation ( 4) are detected.

Overview of the System
In the original AMPD algorithm, the best scale is automatically found by using four calculation steps Local Maxima Scalogram (LMS) calculation, Row-Wise summation of the LMS, LMS rescaling, and peak detection.On the other hand, it is necessary to reduce the amount of memory used in this AMPD algorithm while applying it to the FPGA in order to make a simple and effective pipeline design.The reason is that using more steps in application will increase memory usage and low latency.Original and modified algorithm flowcharts were given in  Since the main target of this study is to apply AMPD algorithm to FPGA, a new algorithm has been proposed where the number of steps of AMD algorithm have been reduced.Figure 4 shows the flowchart about the off-line algorithm.
Therefore, there are only two steps in the proposed algorithms: Local Maxima Scalogram and peak detection for the best off-line calculations of peak points.
The proposed system cannot straightforwardly calculate the peak points automatically with the low memory of FPGA.When the AMPD algorithm is applied on an FPGA, it can be reprogrammed for desired applications so that a logical gate is needed to make a process.Nevertheless, applying the deviation formula generates the LMS matrix and then all values are kept in registers.Figure 5 shows the hardware design of the overview of implementation where k is the window scale.
In this design, a different element of the matrix in every clock cycle is compared and generated.Matrix generators with input X and output Z are serially connected to take advantage of the pipelining.After completion of generating all values, a basic peak flag is used to determine if a value corresponds to a zero point, that is, if a peak point is detected.Matrix generators are used to shift data sequentially, with the data for each matrix generator being compared with newly sampled data.In this manner, matrix elements of each scale are generated.In one clock cycle, an element of the matrix will be generated, and as a result the summation and square summation are calculated sequentially.Finally, a division is performed to obtain an average and a squared average.In the original AMPD algorithm, the standard deviation formula is utilized for detecting peak points.This involves rather complex arithmetic such as square root.To improve performance and efficiency of the FPGA implementation, the process is modified to use variance instead of standard deviation.The formula used in this design is in Equation ( 5), which shows the matrix designed by applying the variance formula to each column.Although both Equation (4) and Equation ( 5) are suitable for applying the variance, Equation ( 5) is used in this study due to its easy representation in hardware design.
(5) The input data is stored into the register and the data it is to be compared with will be inputted into the comparator in every clock cycle.Figure 7 is the expanded detailed hardware of the selector section of Figure 6 and it is the decision giving section on the achievement of positive and negative peak detections.

Results and Discussion
In this section, the original AMPD algorithm and the modified algorithm are evaluated in detail by comparing their detections of peak points of the same data

Simulation of the Original AMPD Algorithm
In this section, the peak points obtained from the original AMPD method as on-line have been introduced.A simulation has been performed with input data of the phase-to-phase effective voltage values of a medium-voltage transformer

Simulation of the Modified Off-Line Algorithm
In this section, a similar simulation as in 4.1 has been repeated for the designed Verilog algorithm using the same data.Hardware Description Language (HDL) simulations were performed with a Cadence NC-Verilog simulator.The daily maximum and minimum peak points detected by the modified AMPD method are depicted in Figures 10(a)-(c) for scale 1025 for L3-L2 (V L3-L2 ), L2-L1 (V L2-L1 ) and L1-L3 (V L1-L3 ) line voltage values, respectively.Table 3 shows the negative edge sensitivities at different scales.As the scale is increased from 33 to 1025, the big changes in sensitivities have been observed for L1-L3 line voltage values.On the other hand, it was seen that maximum sensitivities of 93.54% were obtained at scales 33 and 65 for L2-L1 line voltage values that were almost the same as the original algorithm results.The sensitivities of L2-L1 line voltage values remain constant for scales 33 and 65.However, they are also reduced for the scales 129, 257, 513 and 1025 due to the low frequencies of compared data at these scales.Nevertheless, the sensitivities of L3-L2 line voltage values are constant for the scales 33, 65, 129 and 257, but they are also reduced for the scales 513 and 1025.Lastly, it is seen that negative edge sensitivities are obtained at scales 33 and 65 for all line voltage values.Almost the same results were obtained at scales 33 and 65 as the original algorithm.
Finally, sensitivities of all line voltages of L3-L2, L2-L1 and L1-L3 at higher scales were found very low due to very high sampling periods so that it caused missing the detection of peaks.This can also be explained simply by looking at Equation (1) and Figure 2, where it was seen that, when the scale is increased, the number of detected points are reduced due to an increased number of L in Z matrix.In another case, the sensitivity is also decreased at higher scales due to availability of noises in the signal.

Evaluation Environments and Method
In this section, the aforementioned hardware designed in Verilog HDL is explained and then device utilization and performance of the modified algorithm on the Kintex-7 XC7K325T [24] FPGAs are evaluated.As a mapping tool, a Vivado 2016.3 tool was used.Furthermore, the analog-to-digital converters (ADCs) transform analog electrical signals, generally the voltage amplitude, into a sequence of discrete values for data processing purposes.In this study, it was preferred to use a DC919af ADC with 100 MHz maximum system frequency (sampling rate) and it was implemented on the FPGA board.The main target in this study was to increase the scale to observe and analyze the latency, memory usage and performance of the FPGA board.Therefore, different window lengths at various scales were designed to implement and analyze them on the FPGA board.Firstly, the bit size factor was fixed at 12 because there are many ADC compatible with 12 bits.Then the scale of input data was varied such as 33, 65, 129, 257, 513 and 1025 scales for comparison of resource usage, the result of which is shown in Table 4.
As seen in Table 4, when the scale is increased, an increase is detected directly from some slice logic utilization such as the number of slice LUTs, BRAMs, FFs, and DSP48E1 blocks, as well as the latency.In addition, design algorithm uses on-chip memory blocks (BRAMs) since the entire process is mapped on a pipelined structure based on shift registers.In terms of performance, the latency of peak detection was 23 clock cycles for the scale of 33.When each input data element has 12 bits, the maximum clock frequency is 144.927MHz in Table 4. Furthermore, the maximum frequency was adjusted to 126.438 MHz in Table 4 by using scale 1025 input sources.

Evaluation of the AMPD Method with an FPGA Board
An evaluation of the AMPD method with an FPGA board is done in this section.Table 5 gives information on the performance of latency at 100 MHz timing constraint for two different scales.To state the purpose of detecting the peak points efficiently, the approach in this study achieves the real-time peak detection based on AMPD algorithm on an FPGA.When implementing AMPD algorithm on the FPGA board, bit size was chosen as 12, which was compatible with the ADC and 100 MHz maximum system frequencies.
Figure 11 shows the overview of the experiment system.First, FPGA sends the starting signal to the ADC.Then ADC sends 12 bits input data and 100 MHz system clock signal to the FPGA.Finally, the designed algorithms successfully detect all peak points as illustrated in Figure 14 and Figure 15.In particular, two different AMPD algorithms were designed with scales 65 and 33 due to having best sensitivities.Although the system can successfully detect peak points at both scales, the latency is taken into account for the performance.When scale 33 is selected, the latency is around 320 ns.This latency is a combination of ADC latency and algorithm calculation latency.When scale 65 is selected, total latency becomes around 502 ns because the window scale increases.Accordingly, the more the scale increases, the less the upper limit frequency becomes.The upper limit frequency is obtained by increasing the frequency from signal generator up Table 5 provides information on latency and maximum upper limit frequency for different scales with a fixed bit size and system clock frequency.This table shows the evaluation result of the two different scales of 33 and 65.Furthermore, upper limit frequencies indicated the maximum time step of execution of the algorithm.
While the system is operating, the delay time is composed of two critical parts as algorithm calculation and converter as well as transmitting wire.For instance, when we implemented scale 65, we detected 502 ns total latency time from the oscilloscope screen depicted in Figure 14  Figure 13 shows detail about latency for scale 33.    Figure 15 shows all of the peak points on positive edge with synchronizing 6.148 MHz sinusoidal signal.This implementation result shows scale 33, 12 bit size, around the 320 ns latency time and a maximum frequency of around 6 MHz.

Conclusions
In this paper, a novel modified AMPD method was implemented on an FPGA.It was highlighted that the modified AMPD mechanism could be implemented as a pipelined hardware on an FPGA, and that fast detection latencies (320 ns and 502 ns for scales 33 and 65) could be achieved with a reasonable amount of

Figure 1 and
Figure 1 and Figure 4 respectively.It was also mentioned that original algorithm flowchart in Figure 1 is more complex and has more steps than modified algorithm flowchart in Figure 4.In modified algorithm, step size is reduced and therefore the memory size is also reduced.If the time complexity (O) of algorithm is analyzed in terms of the flowchart given in Figure 3, total time complexity is found as ( ) 2 2 O n .

Figure 3 .
Figure 3.Time complexity of off-line algorithm flowchart.

Figure 6
Figure 6 depicts a Matrix Generator block diagram that is a decision mechanism to generate a matrix by using LMS calculation.It is also a critical part of the LMS calculation.The Matrix Generator requires a decision part for comparing values, which constitutes the major design of the matrix generator module.This module generates a matrix of one scale.The data generated by matrix generator for comparison reason is reduced to store and fed into the shift.The length of the shift register depends on the shift register's scale.The input data is stored into the register and the data it is to be compared with will be inputted into the comparator in every clock cycle.Figure7is the expanded detailed hardware of the selector section of Figure6and it is the decision giving section on the achievement of positive and negative peak detections.

Figures 9 (
a)-(c) and Figures10(a)-(c) illustrate the real data in red color with plus sign and modified Verilog algorithm data in green color with a star sign.However, in this case simulation results are obtained from the Verilog design algorithm as off-line.The daily maximum and minimum peak points detected by the modified AMPD method are depicted in Figures9(a)-(c) for scale 33 for L3-L2 (V L3-L2 ), L2-L1 (V L2-L1 ) and L1-L3 (V L1-L3 ) line voltage values, respectively.
culation time is found by dividing the latency clock cycle by constraint system clock frequency (100 MHz).So it is calculated as 23/100 = 0.23 [μs] = 230 [ns].After that, when this time is subtracted from total latency measure, converter and transmitting wire part can be calculated as 320 [ns] -230 [ns] = 90 [ns].

Figure 14 andFigure 12 .
Figure 14 and Figure 15 are snapshots of experimental results with different scales on FPGA by using an oscilloscope.These results prove that a designed
peak points are done at the time of 10:40, 11:00, 11:10, 17:30 and 19:00.Nevertheless, it should be noted that the original AMPD method introduced in this section has been designed in C Programming Language and it has been run as on-line.Max-Peak Sensitivities and Min-Peak Sensitivities concerning the line voltages are shown in Table1.

Table 1 .
Max-peak sensitivities and min-peak sensitivities concerning the line voltages.

Table 2 .
Positive edge sensitivities for different scales and line voltages.L3 line voltage remain constant for scales 33, 65 and 129.Nevertheless, they are also reduced at scales 257, 513 and 1025 because of the low frequencies of compared data at these scales.Finally, it is seen that maximum sensitivities are obtained at scales 33 and 65 for all line voltage values.They give almost the same results as the original algorithm.

Table 3 .
Negative edge sensitivities for different scales and line voltages.

Table 4 .
A. M. Colak et al.Resource usage with different scales.

Table 5 .
Performance of different scales.
. To calculate its component for the calculation of algorithm, latency clock cycle at scale 65 in Table 4 is divided by constraint system clock frequency (100 MHz) first so that algorithm calculation time is found as 40/100 = 0.4 [μs] = 400 [ns].Then, it was subtracted from the total latency time, 502 ns, read from the oscilloscope in Figure 14, in order to calculate the converter and transmitting wire part as 502 [ns] -400 [ns] = 102 [ns].Figure 12 shows detail about latency for scale 65.This calculation is repeated for the scale 33 to make the distribution of latency time clearer and understandable.At scale 33, total latency time is measured as 320 ns from the oscilloscope screen in Figure15.From Table4, algorithm cal-