^{1}

^{2}

^{3}

^{4}

Typically extrema filtration techniques are based on non-parametric properties such as magnitude of prominences and the widths at half prominence, which cannot be used with data that possess a dynamic nature. In this work, an extrema identification that is totally independent of derivative-based approaches and independent of quantitative attributes is introduced. For three consecutive positive terms arranged in a line, the ratio (R) of the sum of the maximum and minimum to the sum of the three terms is always 2/n, where n is the number of terms and 2/3 ≤ R ≤ 1 when n = 3. R > 2/3 implies that one term is away from the other two terms. Applying suitable modifications for the above stated hypothesis, the method was developed and the method is capable of identifying peaks and valleys in any signal. Furthermore, three techniques were developed for filtering non-dominating, sharp, gradual, low and high extrema. Especially, all the developed methods are non-parametric and suitable for analyzing processes that have dynamic nature such as biogas data. The methods were evaluated using automatically collected biogas data. Results showed that the extrema identification method was capable of identifying local extrema with 0% error. Furthermore, the non-parametric filtering techniques were able to distinguish dominating, flat, sharp, high, and low extrema in the biogas data with high robustness.

In process control, the method of determining peaks and valleys of a signal, also known as identification of local maxima and minima, is crucial for describing and capturing certain signal properties. Identification of local maxima and minima is particularly useful in signal processing, consequently useful in inline/online process control and optimization. Thereby, for reliable feature extraction it is necessary to remove redundant maxima and minima in a processed signal. The issue has been extensively investigated in literature [^{th} term of a series is x_{n}; x_{n} is considered as a peak (maximum) when x_{n}_{−}_{1} < x_{n} > x_{n}_{+1}. In the same time, x_{n} is considered as a valley (minimum) when x_{n}_{−}_{1} > x_{n} < x_{n}_{+}_{1}. In gradient-based methods, extremum can be located by considering slope (gradient) of a certain point and acts as the most popular method [

Once the extrema points are identified, a filtration step is unavoidable to identify the dominant or relevant extrema. Magnitude of prominences and the widths at half prominence are two properties of signals that are commonly used to filter extrema [

One of the main classifications existing in data analysis techniques is whether the method is parametric or non-parametric in its nature [

Non-parametric methods, also known as distribution-free methods, depend on fewer number of underlying assumptions [

In some situations outliers, peaks and valleys are the same, when a sudden extremum (variation) occurs, additionally extrema can be formed due to gradual increment and gradual decrement. The extrema generated in such situations do not behave as outliers and cannot be identified using the aforementioned outlier detection method based on maximum, minimum, and data series sum (MMS) [

The proposed extrema identification method does not involve first or second derivative, but rather compares, within a considered window, two ratios in relation with maximum, minimum, middle point, and the sum of data points. Furthermore, three extrema filtration methods were introduced in this work, which are capable of filtering extrema independent of the prominences or width of an extremum. All the methods introduced in this work are developed for harsh conditions involved in dynamic processes, especially biogas process data, thus handling: non-linear datasets and based upon non-parametric methods.

As mentioned before, the outlier detection method, also by the same authors [_{max} and MMS_{min} and are expressed in Equation (1) and Equation (2). The ratio 2/n is used as the detection criteria, where n is the number of terms in the series.

where a_{min} is the minimum element of the series, a_{max} is the maximum element of the series, n is the number of terms in the series, and S_{n} is the sum of terms in the series.

The complete expression for outlier detection is given by Equation (3). If any series expected to follow y = c form and contains data that do not agree with y = c form then:

where w is the weight.

The method MMS expressed in Equation (3) can be applied on a window with any number of data points. However, when a window has only three data points it becomes a special situation, since the method generates an extremum when points are not in agreement with a linear fit, thus, if there is an extrema, always the middle point would be the extrema. When the numbers of data points are three (n = 3) and w = 0, Equation (3) a special treatment is suggested:

Equation (4) is a simplified version of Equation (3) for handling three data points, where

and (h) of

To address the aforementioned drawback, the MMS method was modified by considering the middle point of the window. To have an exact middle point in a data window the number of considered data points (n) must be odd. When n = 3 and a_{mid} is the middle point of the window, substituting a_{max} from Equation (1) by a_{mi}_{d} retrieves:

Also, by replacing a_{min} of Equation (2) by amid gives,

Consider the situation,

Therefore, Equation (7) denotes the situation of a maximum at the middle point. Thus Equation (7) is a condition, independent of the value of MMS that can be used for identifying a peak.

Consider the situation:

Then Equation (9) denotes the situation of a minimum at the middle point. Thus, Equation (9) is a condition, independent of the value of MMS that can be used for identifying a valley.

Therefore, when a window satisfies Equation (7) it implies that the middle point is a maximum and once a window satisfies Equation (9) it alternatively implies that the middle point is a minimum. Advancing the three point window by one data point makes it possible to locate all the extrema in a signal (

Plot | Data set | MMS_{max} (>0.67) | MMS_{max}_{/}_{mid} | MMS_{min} (>0.67) | MMS_{min}_{/}_{mid} | Peak or Valley | MMS_{max} = MMS_{max}_{/}_{mid} | MMS_{min} = MMS_{min}_{/}_{mid} |
---|---|---|---|---|---|---|---|---|

(a) | 0, 100, 0 | 1 (Y) | 1 | 0.5 (N) | 0 | Peak | Y | N |

(b) | 0, 1.001, 0 | 1 (Y) | 1 | 0.5 (N) | 0 | Peak | Y | N |

(c) | 0, 100, 40 | 0.714 (Y) | 0.714 | 0.625 (N) | 0 | Peak | Y | N |

(d) | 0, 100, 90 | 0.526 (N) | 0.526 | 0.909 (Y) | 0 | Peak | Y | N |

(e) | 100, −20, 100 | 0.5 (N) | 0 | 1 (Y) | 1 | Valley | N | Y |

(f) | −2, −2.2, −2 | 0.5 (N) | 0 | 1 (Y) | 1 | Valley | N | Y |

(g) | 100, 0, 70 | 0.588 (N) | 0 | 0.769 (Y) | 0.769 | Valley | N | Y |

(h) | 100, 0, 25 | 0.8 (Y) | 0 | 0.571 (N) | 0.571 | Valley | N | Y |

(i) | 0, 100, 50 | 0.667 (N) | 0.667 | 0.667 (N) | 0 | Peak | Y | N |

(k) | 0, −100, −50 | 0.667 (N) | 0 | 0.667 (N) | 0.667 | Valley | N | Y |

k) show very special situations, where MMS_{max}, MMS_{min}, and 2/n are equal. In such situations Equation (4) is undefined. However, even then extrema identification is possible with Equation (7) and Equation (9). Since the proposed extrema detection method is based on the maximum, minimum, and sum of the series, the method was named as “MMS max-min finder”.

As above-mentioned, Equation (7) and Equation (9) are independent of the number of data points and thus valid for the situations where n is greater than three (n > 3). However to have an exact middle point, n must be an odd number. When the numbers of data points are higher than three, there can be several peaks and several valleys. However, there is a situation that the highest peak (dominating peak) or lowest valley (dominating valley) coincides with the middle point of an advancing window.

The plot in _{n} while peak B is the dominating peak. Because of that point A is not recognise as a peak in window W_{n}. After advancing W_{n} by two data points, W_{n}_{+2} appears. In the window W_{n}_{+2} the point B is the highest as well as the middle point and the point B is recognized as a peak. Advancing W_{n}_{+2} by two data points W_{n}_{+4} appears, where C becomes the middle point and due to the influence of point B it will not be recognized as a peak. This illustrates that the dominating extrema in a window remains undetected until the middle point of the window coincide with it whilst preventing identification of other small peaks and valleys.

The usage of windows with higher odd number of data points (e.g.: 5, 7, …) makes it possible to filter minor peaks and valleys. In contrast, if the methods in relation with height or width are used, the values are domain dependent and relative. Changing window size (W) is an absolute parameter and can be applied in any condition, especially the situations that the domain conditions are unknown. However, this technique is not capable of filtering absolute small extrema, because the comparison is based on the existing extrema in the considered window. Furthermore, this technique is useful as a filter for removing relative small variations. Since the technique is based on the size of the window, the technique was named as “MMS-Window based filter” or (MMS-WBF).

Extrema with starting and end points which are agreeing with y = c and having the middle point as the extremum can be considered as a symmetric extrema case. Plots (a) and (b) of

Consider a perfect maxima situation as shown in plot (c) of _{min} (a_{min} = c) except a_{max}. Consider any perfect maximum situation with n points, then n − 1 points are equal to a_{min}, and a_{max} ≠ c. The sum of the terms of such a series can be expressed as:

Consider a perfect minimum situation as shown in plot (d) of _{max} and a_{max} = c except a_{min}. Consider any perfect maxima situation with n points. Then n − 1 points are equal to a_{max}, and a_{min} ≠ c. The sum of the terms of such a series can be expressed as:

If

When the maximum is detected as the peak, substituting in Equation (11) retrieves:

In the same manner, if MMS_{min}/MMS_{max} = R_{mM}, then from Equations (1), (2), and (12), the minimum is detected as the valley,

The relations of Equation (15) and Equation (16) are crucial findings, which can be used to identify perfect extrema. When the extrema is not perfect, value of Equation (15) and Equation (16) is less than n − 1. Therefore, Equation (15) and Equation (16) can be used to identify perfect and non-perfect extrema. Also, perfect extrema are sudden (sharp) extrema and non-perfect extrema can be considered as gradual extrema. Thereby, using Equation (15) and Equation (16) it is possible to filter sharp and gradual extrema.

After identifying a peak, by examining the ratio MMS_{max}/MMS_{min} it is possible to determine degree of confidence of other points, the same applies for identifying a valley. Assume t_{Mm_mM} is the threshold value for determining sharp and gradual maxima, then t_{Mm_mM} can be expressed as a_{Mm_mM} is a function of n. By setting the same threshold value (t_{Mm_mM}) for MMS_{max}/MMS_{min} and MMS_{min}/MMS_{max}, sudden and gradual maxima can be determined. The determination criteria (t_{Mm_mM}) of ratios MMS_{max}/MMS_{min} and MMS_{min}/MMS_{max} are non-parametric and depend only on the number of data points in the considered window. Since the method is also based on the maximum, the minimum, and the sum, the method was named as MMS-SG filter.

_{max}/MMS_{min} = 6, which is exactly equal to n − 1. This proves the correctness of Equation (15). In the same time, in plots (a) and (b) of _{min}/MMS_{max} = 6 and proves the correctness of Equation (16). All these plots exhibit either sudden peak or sudden valley. The corresponding ratios in relation with the plot (c) of

MMS-WBF and MMS-SG introduced in this work are capable identifying dominating, sharp and gradual extrema. However, these techniques are incapable of distinguishing the extrema with very small amplitude as shown in

The valley shown in _{min} ≈ a_{max}.

Then, Equation (13) can be expressed as:

(17)

If_{min} is very close to the other points (low crater), consequently _{min} is apart from the other points (high crater).

In Equation (17), when the term a_{min} is zero, the ratio R_{LH_}_{min} also becomes zero despite of the influence of magnitude valley. Also, due to the influence of negative values S_{n} can be zero and R_{LH}_{_min} becomes invalid. Both these situations inhibit the determination of the real condition of the valley. To overcome the effect of negative values, the minimum value was deducted from all the terms of the data points in the window as expressed in Equation (18).

Even now it is possible to have a situation of a_{min} = 0. To overcome this situation a constant k, which is greater than zero, was added to each value. This transformation is applied in “Min-Max normalization” process [

From Equation (17) and Equation (19),

Then R_{LH_}_{min} expressed in Equation (20) can be considered as a robust method for filtering valleys with low crater.

The peak shown in _{max} ≈ a_{min}. Then Equation (11) can be expressed as:

According to Equation (17), the ratio R_{LH_}_{min} has a well-defined upper limit (ceiling) and lower limit (floor) because_{LH_}_{max} has no upper limit, and subjects only to a lower limit. Therefore, it is difficult to use R_{LH_}_{max} as a global criteria as R_{LH_}_{min}. The peak shown in

According to Equation (22),

_{LH}_{_}_{max} is the corresponding ratio in relation with high and low peaks identification, then, from Equation (21) and Equation (22), one can reach:

Even after the aforementioned transformation, it is still possible to have the influence of negative values. However, it can be resolved by using Equation (19). Then, from Equation (19),

_{max} is very close to other points (low prominence). Consequently _{max} is apart from the other points (high prominence).

Finally, using Equation (17) and Equation (24) it is possible to determine the high and low extrema by defining a threshold value t_{LH} (_{LH}_{_}_{min} and R_{LH_}_{max}. Because the method is based on the maximum, minimum and the sum, the method was named as MMS-LH.

The filtration of sudden, gradual, low, and high extrema are derived based on a data set which satisfies the y = c relation (perfect extrema). However, in reality it is impossible to always have perfect extrema. Therefore, by setting the threshold values in appropriate situations, it is possible to filter the extrema in non-perfect conditions.

Extrema identification is performed after comparing two ratios in relation with maximum, minimum, middle point and sum. The threshold criteria for MMS-WBF and MMS-SG are values that are based on the number of data points (n). The threshold criterion for MMS-LH is a value between 0 and 1. Thus, all the determination criteria are totally non-parametric. However, combination of these methods leads to harvest more robust and reliable output.

All the algorithms were implemented using C++ in Net 2008 platform and tested with biogas data which were collected online form a biogas plant using NIR spectroscopy for a period of seven months with a frequency of twelve data points per day (i.e. every second hour). Among the different parameters, the H_{2} content measured in ppm was selected, which has considerable amount of variations during the process. Data of each month was considered as a segment, where each segment consists of 350 - 400 data points. The proposed detection methods were applied on each segment with different criteria. Furthermore, another data set of around 4800 data points, concentrations of volatile fatty acid (VFA), was selected for checking segmenting capabilities of the method.

Each plot (a) and (b) of

Plot (e) and (f) of

The same two data sets shown in

the candidate points have not been checked. This is a disadvantage of increasing the window size for filtering non-dominating extrema. In plot (d) of

The combination of MMS max-min finder and MMS-WBF can be used in online data checking. For that, first the window size (W) has to be defined, and then the window accumulates the data, after which the desired detection technique is applied and eventually the extrema are located. Subsequently, window is advanced by one data point and awaits the next data point. After the next point is captured, the extrema- check is performed again. This process is propagated throughout the process for locating extrema in an online environment.

_{Mm} and R_{mM} as defined in Equation (15) and Equation (16), respectively. Value of t_{Mm_mM} for R_{Mm} and R_{mM} was set as 1 (_{1}, P_{1}, P_{2}, and P_{3} shown in plots (a) and (b) of

Plots (c) and (d) in _{1}, P_{1}, P_{2}, and P_{3} after increasing the window size to nine (W = 9). After applying large W (W = 9) almost all the flat extrema have been rejected. Even after increasing the W still extrema such as P_{4} are remaining, because W is not big enough to reject such points (i.e. in the selected window size, the extremum point is located significantly away from other points). In general, plots (c) and (d) of _{max}/MMS_{min} and MMS_{min}/MMS_{max} can be considered as filtering criteria and a reliable technique for filtering sharp and gradual (flat) extrema.

As per the results shown in _{max}/MMS_{min} and MMS_{min}/MMS_{max} are not capable of filtering extrema based on magnitude of their prominence or crater. The results shown in

Before applying MMS-LH, data points (plots (a) and (b) of _{1} in _{1} is not a perfect extremum. Therefore, the rejection is logical as well as mathematically correct. Nevertheless, in _{1} is identified as a valley, because the large window size (W = 9) makes V_{1} a nearly perfect extremum. Therefore, using W > 3 with appropriate filter criteria the method can be used for filtering extrema with low and

high prominence or crater.

In

Usually, dominating peaks and the valleys can be considered as turning points of a certain property of a signal, if those dominating extrema are not outliers. Thus, dominating peaks and valleys are good points for segmenting a signal as well as identifying general trends.

extrema than plot (a) of

The introduced extrema finding method named as “MMS Max-Min finder” and three different extrema filtering methods named as MMS-Window Based Filter (MMS- WBF), MMS sharp and gradual extrema filter (MMS-SG), and MMS low high extrema filter (MMS-LH) are non-parametric. Therefore, filtering can be done without considering domain dependent parameters such as height and width of an extremum. Results prove that the detection is capable of identifying all the extrema with 0% error. When the window size is nine (W = 9) MMS-WBF reported 0.12% and 0.09% wrong detections. However, a combination of MMS-WBF and MMS-LH filter with window size nine (W = 9) was capable of eliminating the error. Despite of the dynamic nature of the data, the results were consistent and robust for the same detection criteria. Thus, using proper window size, it is possible to achieve robust and consistent outcome with dynamic data such as biogas data. Furthermore, MMS-WBF shows promising outcome in the direction of segmenting and trend identification of signals. Hence, MMS-WBF can be enhanced as a segmenting and trend identification technique.

This work was supported by the German Research Foundation (DFG) and the Technical University of Munich (TUM) in the framework of the Open Access Publishing Program. Also, we are grateful to the German Academic Exchange Service (Deutscher Akademischer Austauschdienst, DAAD) for providing a scholarship to KKLB Adikaram during the research period.

Adikaram, K.K.L.B., Hussein, M.A., Effenberger, M. and Becker, T. (2016) Non-Parametric Local Maxima and Minima Finder with Filtering Techniques for Bioprocess. Journal of Signal and Information Processing, 7, 192-213. http://dx.doi.org/10.4236/jsip.2016.74018