Step-by-Step Building of a Four Dimensional Fatigue Compatible Regression Model including Frequencies

The purpose of this research is to develop a model, with emphasis on compatibility conditions and model building, valid for high cycle fatigue design components such as wind turbines, automobiles, high speed railways and aeronautical material. In this work, we have added the frequency as one more variable to an existing fatigue model that already includes maximum stress, stress ratio and lifetime. As a result, a model and estimation method has been proposed and a random variable V has been identified, which, allows the ac-cumulated damage and the probability of failure to be assessed for any load history in terms of stress levels, stress ranges and frequencies. Finally, the model is validated using a large set of real experimental data.


Introduction
The current structural integrity concept in the fatigue design in aeronautics, elec-How to cite this paper: Castillo, E., Fernández-Canteli, A., Blasón, S., Khatibi, G., Czerny, B. and Zareghomsheh, M. (2021) Step-by-Step Building of a Four Dimensional Fatigue Compatible Regression Model including Frequencies. Open Journal of Statistics, 11, 1072-1096. to avoid incompatible formulas causing inconsistencies. Thus, in this paper functional equations are indirectly and strongly recommended for modeling, not only in fatigue but also in problems of general nature.

First Step: The S-N Bidimensional Models
Fatigue models started with the up-and-down method, that was designed for determining the endurance limit as its only variable. However, most fatigue models are bidimensional and include only one of the first two stress variables already mentioned and the lifetime, which are the models dealt with in this section. However, one must keep in mind that these models are valid only when the remaining two variables, the second stress and the frequency variables, are considered constant.
A significative advance in the development of fatigue models took place with the appearance of bidimensional S-N regression models, initially linear, later bilinear and finally of hyperbolic type. The selection of the mathematical structure of these models was initially based on convenience and simplicity of the models. This is the case of the linear models, which, when they were perceived as not adequate, a log transformation of one or the two axes was used to improve them. When this was not sufficient, the bilinear models were used as a simple and practical solution.
These regression curves, in the sense of being conditional means, were used extensively, but unfortunately its random character and the corresponding percentile curves were not fully introduced as a necessity to deal with the actual random nature of the problem.
Fortunately, curved trend models of random type were soon accepted and other models appeared. This was the case of the model in [5] where the mathematical structure of the regression curve and percentiles came from the solution of a functional equation expressing a compatibility condition that must be satisfied.
In this context, a special mention deserves the use of functional equations in the process of building models. These equations, as differential or integral equations, express properties that must be satisfied by the models being looked for.
However, the unknown functions neither appear as derivatives, nor inside integrals, what makes the interpretation and handling of the problem more convenient, but without losses in its power.
As a reminder, in Figure 1 the compatibility condition used to derive some bidimensional model is illustrated graphically. In it, a N σ ∆ − plot is presented, in which the virtual N σ ∆ − curves associated with the different specimens used in fatigue tests are shown as continuous lines, in blue color. Since these fatigue tests are normally destructive, only one use of each specimen is possible, and then, only one point on that curve can be observed, but the existence of this curve needs to be emphasized and its conceptual importance to be pointed out.
One first comment on it consists in observing that these blue curves do not intersect, as it occurs in the plot. This means that one specimen is more resistant than other no matter the value of σ ∆ , what is a simplifying, but a reasonable assumption.
The first compatibility functional equation, used in [5], is based on the following observation: Any curve in the left plot of Figure 1 entering the square on the left-down rectangular area through out the horizontal line i σ σ ∆ = ∆ exits this area through out the vertical line j N N = , so that the density functions of the variables lifetime, N, given σ ∆ , and stress range, σ ∆ , given N, defined all over the S-N field, permit plotting the density functions, as those shadowed in the figure, representing the fractions of curves, passing through the horizontal and vertical segments, which must be coincident. Note that all this implies that are the cdf of σ ∆ given N and the cdf of N given σ ∆ , respectively. The variability field of both variables in a correct model must fulfil this strong condition. Otherwise, the fatigue model would become not valid. Next, we assume that the random variables implied in the fatigue behavior definition belong to the same family of distributions, such as one of the extreme value distributions (Weibull, Gumbel, Fréchet) or any other location-scale family. This is a weak and natural assumption, because it would have no sense if for given units the distribution belongs to the family but it fails to belong when changing units of measure. Similarly, when asymptotes are involved it is natural to admit location changes so that changes in the asymptotes are included in the selected family of distributions.
If the cdfs involved in (1) are assumed strictly increasing, the functional equa- with six unknown functions: A very general solution of this equation was found in [5], but a rigorous solution of it was given in [6] and [7], in which it was shown that there is only one family of physically percentile compatible solutions, with , , B C D and E constant parameters, of the form: Next, assuming validity of the weakest link principle for the lifetimes, the fatigue lifetime distribution represents a minimum problem asymptotically defined by the generalized extreme value family, see [8] [9] [10] [11] or [12]. Because the existence of a lower bound in the lifetime (it cannot be less than zero) the Weibull distribution becomes the only candidate, see [13] [14] and [15]. The Weibull distribution together with the percentile curves of the form in (3) complete the bidimensional model.
Consequently, these models were selected as the only appropriate ones for fatigue analysis when M σ is kept constant and when R is kept constant during the test, see [12] and [16].
Since most bidimensional fatigue models used in practice are written in terms of the lifetimes supplied for the stress range, for a fix value of the stress level, selected from the set of secondary variables { } , , , M m mean R σ σ σ , to reduce the effect of this limitation, a hypothetical equivalence among the different S-N fields for different stress levels is assumed, even though this is only a strong simplification. Some examples of this approach are the models of [17]- [25]. Some comparative studies among these models were reported in [26]- [32]. However, only very few works address the probabilistic analysis of models focused on the stress level effect, [33] and [34], or multiple explanatory variables, [35].
Unfortunately, these empirical type proposals are supported only by their simplicity and the associated fitting quality of the associated experimental results, but with no theoretical justification. A revision of some of these methods used in fatigue analysis can be seen in [12], together with some new and original proposals for fatigue models dealing with crack growth and damage accumulation.
The final conclusion is that bidimensional models are not satisfactory for fatigue analysis, but the compatibility condition must hold for all bidimensional models coming from higher dimension models when the remaining variables are kept constant.
Since the same considerations used for the N σ ∆ − models for constant M σ are also valid for constant R, this implies that we have two possible com- Even though these two similar models can be used, if not carefully chosen they will be incompatible. In the following section the compatibility problem of both bidimensional models is stated and solved. This permits identifying the only compatible three-dimensional models of the form

Second Step: A Three-Dimensional Compatibility Model
In [36] and [37] the two bi-dimensional models in (4) and (5) were made consistent using a functional equation leading to the three-dimensional model

Third Step: Asymptotes Compatibility
In addition, some of the functions appearing in (6), which define model asymptotes, as indicated in [4], are known. In fact, we must have: The functional Equation (6) and the asymptotes in (7) lead to the following Weibull model: which depends on seven parameters: From (8), the following V Weibull random variable, which will play a key role in the cumulative damage evaluation, arises: The physical meaning of these parameters is as follows: a is the scale parameter of the V Weibull distribution; β is the increment of the Weibull location parameter of the V variable for with respect to 0 R = and K is the Weibull shape parameter of the V variable.
In addition, the associated regression models are: which allows us to plot σ ∆ versus The asymptotes of the curves in (10), (12) and (13) are given in Table 1. The compatibility of the inequality relations in the plots in Figure 2, which must be taken into account for the model to be valid, implies the two constraints 1 0 δ ≥ Next, a physical interpretation of the V variable defined in Expression (9) (9), a damage comparison can be made of two specimens subject to different load histories, when the frequency is constant. This is relevant, because later, based on this variable, the damage accumulation associated with any varying load history, will be calculated. Table 1. Horizontal and vertical asymptotes of the curves in (10), (12) and (13).

Equation
Horizontal asymptote Vertical asymptote

Fourth Step: Reproducing the Frequency Effect
In this section the basic ideas and formulas used to reproduce the effect of frequency on fatigue tests are discussed. It is assumed that the influence of frequency on fatigue is manifested in the variation of the elastic modulus tending to an effective elastic modulus, ef E . The observation is consistent with experimental experience that the speed in a static test influences the apparent modulus of elasticity.
The higher the test speed, the more rigid the material behaves. Therefore, the higher the frequency, the higher the effective elastic modulus, ef E . Asymptotically, when the frequency is practically zero, the behavior of the material, i.e. the effective modulus of elasticity, is the nominal one, so When the frequency is very high, the effective modulus of elasticity tends to an asymptotic value, E ∞ .
The effect of frequency on the S-N fields has been studied in [39], where a model based on a three-parameter Weibull cumulative distribution was proposed to analyze the influence of frequency and stress ratio on the fatigue life in concrete, both plain and reinforced with fibers, showing the reduction of effective stresses due to increased frequencies.
All this is in accordance with a viscoelastic simile of the material, in the sense that for high values of frequency, the behavior of the material tends to reach an asymptotic value according to a sigmoidal function, such as a biparametric Weibull cdf. If we observe a cycle in fatigue under the condition of test in load control, this assumes that the induced deformation in a load cycle decreases as the frequency increases, as it can be seen in Figure 3.
Since deformation is the determining mechanism in the fatigue process, the range of deformations deduced from the range of stresses, assuming that the static modulus of elasticity is the key factor. This implies that in reality, a lower deformation is being applied according to the relationship 0 ef E E and, accepting the linear relationship between strain and stress given by 0 E , for the representation in the S N − field, a shift to the right of the S N − curves will be observed as the frequency increases.
where 0 f is a reference frequency and τ is a dimensionless scale parameter.
Note that the Weibull distribution has been justified for the logarithm of .
From now on, we call which, from (15) gives where 0 1.
Accordingly, the value of M σ and the stress amplitude are affected by the factor ef a , while the mean stress is not affected. This is due to the way of carrying out the test, in which the load is placed in the medium stress and then it is expanded until it reaches the desired amplitude. With which, it turns out that:  (10), (12) and (13): This means that the influence of the frequency on fatigue is equivalent to a change in the M If the model turns out to be correct, the frequency characterization of a material, that is, the determination of the curve would be easily determinable. It would only be a matter of applying a relatively high stress at different frequencies and measuring the deformation due to that stress, which would allow us to observe the evolution of the modulus of elasticity as a function of frequency.

Proposed Model
The model proposed in this paper starts from the model developed in [4], in which a three-dimensional compatible M R N σ − − model for constant frequency was derived using functional equations. In this section this model is extended to four dimensions by including the frequency variable, which plays a very important role in several applications, e. g. high speed train bridge safety.
It is clear that this model and consequently, the following equations must be applied to the effective values of M σ and R, because, even though the material is apparently subject to given nominal values of M σ and R, the frequencies modify these actual values to the effective ones. Thus, the following equations refer to effective values.
In the previous model it was demonstrated that the variable in (9) plays a crucial role, because all the remaining variables can be considered as deterministically dependent on this one. In other words, the model can be written in a form such that its deterministic and its random parts are identified, that is, they be-come separable. In addition, since the variable V is unidimensional, only the random parameters associated with it rule the random behavior. Obviously, the remaining model parameters are deterministic.
Replacing , respectively, in (10), (12) and (13) 3)  It is important to realize that the common term ( ) , , , h a p K γ in (11) appearing in expressions (30) (31) and (32) refers to the Weibull distribution of the V in Expression (9), where γ , a and K are its location, scale and shape parameters, respectively. Alternatively, other family of distributions can be selected to reproduce the random behavior of V.
Note finally, that the role of the V damage variable remains valid, when frequencies are considered if the effective stresses are used instead of the nominal ones.

Parameter Estimation
To estimate the ten model parameters, least squares, which minimizes the sum of error terms, that is, the differences between the observed values, where n is the sample size.
In the second step, and keeping , that is, the mean of V, constant, the random parameters, { } , , a K γ are obtained by restricted maximum likelihood. This restriction guaranties that the previously estimated regression curves are not modified and also avoids overfitting. This means that the following optimization problem is solved: This optimization problem has been implemented in matlab, using the func- tion "fmincon". One example of application is given in Section 9.

Damage Accumulation
The damage interpretation of the V variable in (9) offers us the possibility of deriving from it a damage accumulation conversion law, which will allow us to calculate the probability of failure derived from any varying load history.
It has already been indicated that the V expression in (9) gives the cumulative damage associated with N cycles when the load ef M σ and ef R is applied assuming that the initial damage is zero. In addition, its cdf gives the probability of failure, associated with N cycles when that load is applied assuming a threshold value, 0 V , associated with a threshold number of load cycles, for which the probability of failure is zero. Thus, the damage level associated with N load cycles and zero initial damage, is: However, we look for a more general formula allowing us to consider how the damage level evolves after a given damage, 0 V , is reached. This is one of the aims of this section. We start by saying that the Expression in (40) cannot be given arbitrarily. Otherwise, using different threshold damages will lead to different final damage levels even though the loads and the number of cycles coincide (see, for example, [40]).
To avoid this problem, the formula to obtain V in (40) must be subjected to a compatibility condition, which states that the same damage level must be obtained if (a) we start from a 0 V threshold damage and apply 0 N N + cycles of the load or (b) we start from the damage level associated with 0 N cycles and afterwards we apply N cycles of the same load. This, compatibility condition can be written as:  [44] or [45]). From (41) and using (39) twice one obtains:     It is obvious that these two formulas become crucial when one needs to evaluate damage states caused by different load histories, as it happens in real practice.

Example of Application
In this example we have a set of 243 fatigue test data. The samples consisted of commercial high purity Al wires (type Al-H11 from Heraeus in as-received condition with a diameter of 400 μm and a gauge length of 10 mm. A TA Electro-Force DMA 3200 testing machine with a load capacity of 500 N, a frequency range of 0.00001 -300 Hz and a displacement range of ±6 mm was used for the experiments. The load controlled fatigue tests were carried out in tension-tension mode at a stress ratio of 0.1 R ≈ by using 6 -10 samples at each stress level. The S-N curves were obtained by collecting the data in a broad range of 10 2 up to about 10 8 loading cycles at three testing frequencies of 2, 20 and 200 Hz, and the model parameters have been estimated using the methods described in Section 7 and commented below. E ratio is assumed equal 3.
The resulting lifetimes are plotted in Figure 4, using three different colors, blue, for 2 Hz f = , green, for 20 Hz f = and red, for 200 Hz f = , respectively.
A simple look to the data in Figure 4 reveals the presence of a few outliers, which are indicated using squares instead of circles for the data points. We E. Castillo et al. advance that these five outliers, which are only 2% of the 243 sample data points, were selected using both, probability papers and analyzing to percentile curves of the fitted model including all data points (see . Table 3 shows the five outliers indicating their sample indices, their lifetimes and the corresponding values of M σ in MPa.
The two-step method described in detail in Section 6 has been used to estimate the parameters with all data points and the resulting fitted model plotted to analyze the possibility of some outliers, as revealed by our first look to the data points. The outliers were confirmed based on Figures 4-8, but this can also be done using well known methods, such as, for example, those given in [46] and [47].
Once this has been done and the outliers identified, they have been removed from the sample and the estimation and possible detection of more outliers repeated to select the final model for its use in practical applications, such as an analysis of safety against fatigue in high speed railway bridges.
For the sake of facilitating the comparison of both results, with and without outliers, and to analyze how them affect the final results, the plots corresponding to both estimation processes have been grouped, such that the left plots refer to the whole data and the right plots to the removed outliers case, respectively.
The first estimation step, in which the regression curves are obtained using least squares, provides the seven deterministic parameter estimates and the second step gives the three random ones, obtained by restricted maximum likelihood, assuming a Weibull distribution for the residuals. They are shown on the left and right parts of Table 4, respectively.     a close to linear general trend but showing some outliers at both ends. As already indicated, the five outliers have been selected using this plot and the following percentile plots. They clearly show that we are in front of outliers, some of them confirming our first impression after looking to the data in Figure 4. The right plot in Figure 6 corresponds to the data points after removing the outliers, which also shows a linear trend and suggests no more outliers present. The lower end, suggesting a vertical asymptote, indicates a Weibull distribution close to a Gumbel one ( 6.66 K = ) (see [48]). The plots in Figure 7 show the empirical cdf of all the data and all data with outliers removed, together with the 0.01, 0.05, 0.50, 0.95 and 0.99 percentiles of the order statistics of a Weibull model, confirming once more the Weibull family as a reasonable choice.
It is clear that the Weibull distribution of V is justified, theoretically and physically, by extreme value theory, because fatigue failure is a process that takes place using the weakest link or weakest region principle, stating that failures occur at the weakest locations.
Alternatively, in the second step, the errors or residuals, i V , are also calculated and plotted, in Figure 8, together with the "ksdensity" matlab kernel estimate, which is based on a normal kernel function and uses a window parameter (bandwidth) as explained in [49]. The plots in Figure 8 show the Weibull and kernel pdf estimates together with the data for the whole data and the removed outliers on the top and bottom plots, respectively, and the Weibull and kernel cases, on the left and right plots,  respectively. It is interesting to see how the top plots allow identifying the outliers on their left and right zones.
If the Weibull model is used, the resulting regression curves and the associated percentiles are shown in Figure 9 and Figure 10, for M σ and ef M σ , respectively. Figure 9 shows the three data sets, in different colors, and the resulting 0.01, 0.05, 0.50, 0.95 and 0.99 percentiles of the M σ versus lifetime N regression curves using a Weibull parametric estimate, for outliers included (left) and removed (right) cases, respectively.
Finally, Figure 10 shows the three data sets, in different colors, and the re- selected model. As it can be seen, this also reveals the presence of some outliers, corresponding to the two sides of the regression curve. Note also that the outliers also appear in Figure 9 and Figure 10 on both sides of the mean, as already indicated by the density estimate in Figure 8, where the five outliers correspond to the five points on the left and right extremes.
The regression curves M σ versus N and ef M σ versus N corresponding to the kernel estimates together with the associated percentiles, as those in Figure 9 and Figure 10, are omitted because they are practically identical to the Weibull model ones. This is due to the fact that the densities in Figure 8 are very similar for the Weibull and kernel cases. Finally, the resulting diagram of ef E * versus frequency is shown in Figure   11, showing its sigmoidal shape. In summary, it can be concluded that, at least in this example including a Open Journal of Statistics large size data set, the model behaves well and as expected. Future analysis will be required to confirm the use of this model. However, the possibility of a four dimensional analysis, that is, including frequencies as the fourth variable is now possible.

Conclusions
The most important conclusions from this paper are: 1) The necessity of replacing or improving existing models, broadly acknowledged and recommend by standards and guidelines, needs to be emphasized to avoid the consequences of application of invalid models as reported in the literature.
2) A new method for considering the joint effect of maximum stress, stress ratio and frequency on the lifetime is presented in Sections 0 to 1. The frequency is included as a new and relevant variable to be considered in the design of important high-risk components such as wind turbines and automobiles, high speed railways and aeronautical material.
3) It has been shown that compatibility conditions are important tools to build models without arbitrary assumptions. These conditions were easily stated as functional equations, in Sections 2 to 4, which when solved provide the unique models satisfying the corresponding conditions. To our knowledge, no other method exists satisfying these conditions. 4) The model shows, in Section 5, that the frequency effect is equivalent to a concurrent modification of the maximum stress-stress ratio pairs from each test, what has immediate consequences in how to design the tests conducting to a precise parameter estimation. 5) The model parameters are classified, in Section 7, into two distinct classes: those defining the deterministic model, which are estimated by least-squares regression, and those reproducing the random ones, estimated by a restricted maximum-likelihood method. An estimation method is proposed and applied independently to both classes to avoid overfitting, in Section 7, and tested later in Section 9. 6) A model damage measure variable, V, is identified in Section 4, which together with its cdf, permits the damage accumulation to be evaluated and the associated probability of failure to be determined subject to any varying load history. A general formula is given in Section 8, which allows the cumulative damage to be analytically calculated from the joint effects of the three changing variables, maximum stress, stress ratio and frequency. 7) Due to its integrated character, because it includes all variables, the use of the model, will certainly allow us to reduce the sample size, and also to optimize the test sequences when used in designing test studies. 8) A more general formula that provides the damage of any load when the initial damage is known is obtained in Section 8 as the result of a functional equation that guaranties the compatibility, that is, damage to become independent on the way the loads are considered. 9) An unusually, rarely available large sample data, from an experimental campaign in which 243 fatigue specimens were tested, was used to validate the model, in Section 9, resulting very good results, as shown in the figure examples.