^{1}

^{2}

^{3}

^{4}

The software reliability model is the stochastic model to measure the software reliability quantitatively. A Hazard-Rate Model is the well - known one as the typical software reliability model. We propose Hazard-Rate Models Considering Fault Severity Levels (CFSL) for Open Source Software (OSS). The purpose of this research is to make the Hazard-Rate Model considering CFSL adapt to baseline hazard function and 2 kinds of faults data in Bug Tracking System (BTS) , i.e., we use the covariate vectors in Cox proportional Hazard-Rate Model. Also, we show the numerical examples by evaluating the performance of our proposed model. As the result, we compare the performance of our model with the Hazard-Rate Model CFSL.

Open Source Software (OSS) is used by many organizations in various situations because of its low cost, standardization, and quick delivery. However, the quality of OSS is not ensured, because OSS is developed by many volunteers around the world in a unique development style. Then, the development style has no organized testing phase. The faults latent in OSS are usually fixed by using the database of Bug Tracking System (BTS). There is various information related to faults in BTS. The reliability assessment of OSS is necessary and important for the demand in the future and the current problem of OSS.

The software reliability model is a mathematical model to measure software reliability in statistical and stochastic approaches. As of today, many various models not only for proprietary software but also for OSS have been proposed by a lot of researchers [

The purpose of our research is to propose the Hazard-Rate Model including various faults data in BTS of OSS. Specifically, we make the Hazard-Rate Model with CFSL adapt to the baseline hazard function in Cox PHM, and 2 kinds of faults data in BTS to the covariate vectors in Cox PHM. Moreover, we show several numerical examples based on the proposed model to evaluate the performance of the model.

BTS is the database. This is that OSS users can report the information about faults in OSS. There is various information in BTS, e.g., the recorded time of fault, the time of fault to be fixed, the nickname of fault assignee, and so on. We show the list of fault data in BTS in

Firstly, we show the stochastic quantities related to the number of software faults and the time of occurrence of software failures in testing phase or operating phase as shown in

The distribution function of X k ( k = 1 , 2 , ⋯ ) representing the time-interval between successive detected faults of ( k − 1 ) st and k^{th} is defined as:

F k ( x ) ≡ Pr { X k ≤ x } ( x ≥ 0 ) (1)

where: Pr{A} represents the occurrence probability of event A. Therefore, the following derived function means the probability density function of X k :

f k ( x ) ≡ d F k ( x ) d x (2)

The kind of fault data | Contents |
---|---|

Opened | The date and time recorded on the bug tracking system. |

Changed | The modified date and time. |

Product | The name of product included in OSS. |

Component | The name of component included in OSS. |

Version | The version number of OSS. |

Reporter | The nickname of fault reporter. |

Assignee | The nickname of fault assignee. |

Severity | The level of fault. |

Status | The fixing status of fault. |

Resolution | The status of resolution of fault. |

Hardware | The name of hardware under fault occurrence. |

OS | The name of operating system under fault occurrence. |

Summary | The brief contents of fault. |

Also, the software reliability can be defined as the probability that a software failure does not occur during the time-interval ( 0, x ] . The software reliability is given by:

R k ( x ) ≡ Pr { X k > x } = 1 − F k ( x ) (3)

From Equations (1)-(3), the hazard-rate is given by the following equation:

z k ( x ) ≡ f k ( x ) 1 − F k ( x ) = f k ( x ) R k ( x ) (4)

where: the Hazard-Rate means the software failure rate when the software failure does not occur during the time-interval ( 0, x ] . A Hazard-Rate Model is a software reliability model representing the software failure-occurrence phenomenon by the Hazard-Rate.

Moreover, we discuss three Hazard-Rate Models as follows.

Jelinski-Moranda (J-M) model is one of the Hazard-Rate Models. J-M model has the following assumptions:

1) The software failure rate during a failure interval is constant and is proportional to the number of faults remaining in the software;

2) The number of remaining faults in the software decreases by one each time a software failure occurs;

3) Any fault that remains in the software has the same probability of causing a software failure at any time.

From the above assumptions, the software Hazard-Rate in Equation (4) at k^{th} can be derived as:

z k ( x ) = ϕ [ N − ( k − 1 ) ] ( N > 0 , ϕ > 0 ; k = 1 , 2 , ⋯ , N ) (5)

where: each parameter is defined as follows:

N: the number of latent software faults before the testing;

ϕ : the Hazard-Rate per inherent fault.

Moranda model has the following assumptions:

The software failure rate per software fault is constant and is decreasing geometrically as a fault is discovered.

From the above assumptions, the software Hazard-Rate in Equation (4) at k^{th} can be derived as:

z k ( x ) = D ⋅ c k − 1 ( D > 0 , 0 < c < 1 ; k = 1 , 2 , ⋯ ) (6)

where each parameter is defined as follows:

D: the initial Hazard-Rate for the software failure;

c: the decrease coefficient for Hazard-Rate.

Xie model has the following assumptions:

The software failure rate per software fault is constant and is decreasing exponentially with the number of faults remaining in the software.

From the above assumptions, the software Hazard-Rate in Equation (4) at k^{th} can be derived as:

z k ( x ) = λ 0 ( N − k + 1 ) α ( N > 0 , λ 0 > 0 , α ≥ 1 ; k = 1 , 2 , ⋯ , N ) (7)

where each parameter is defined as follows:

N: the number of latent software faults before the testing;

λ 0 : the Hazard-Rate per inherent fault;

α : the constant parameter.

Three Hazard-Rate Models above have the following assumption:

Any fault that remains in the software have the same probability of causing s software failure at any time.

From this assumption, three Hazard-Rate Models are called exponential Hazard-Rate Model. MTBF by three Hazard-Rate Models can be derived as:

E [ X k ] = ∫ 0 ∞ x f k ( x ) d x = ∫ 0 ∞ R k ( x ) d x ≡ 1 z k ( x ) (8)

Hazard-Rate Model CFSL is the Hazard-Rate Model for OSS considering the fault severity levels in BTS. This model represents the Hazard-Rate for OSS itself by representing the Hazard-Rate for the normal fault and for the others one respectively. In this section, we discuss the Hazard-Rate Model CFSL.

We assume that the fault data is divided into the following types in terms of the fault severity levels in BTS:

A1: the normal fault;

A2: the others fault.

In the assumption above, A1 is the fault detected as a normal one, A2 is the fault detected as the other one. Also, OSS manager cannot differentiate between assumptions A1 and A2 in terms of the software faults. The time interval between successive faults of ( k − 1 ) st and k^{th} is represented as the random variable X k ( k = 1 , 2 , ⋯ ) , Therefore, the Hazard-Rate function z k ( x ) for X k is defined as follows:

z k ( x ) = p ⋅ z k 1 ( x ) + ( 1 − p ) ⋅ z k 2 ( x ) ( k = 1 , 2 , ⋯ ; 0 ≤ p ≤ 1 ) (9)

z k 1 ( x ) = D 1 ⋅ c 1 k − 1 ( k = 1 , 2 , ⋯ ; D 1 ≥ 0 , 0 < c 1 < 1 ) (10)

z k 2 ( x ) = D 2 ⋅ c 2 k − 1 ( k = 1 , 2 , ⋯ ; D 2 ≥ 0 , 0 < c 2 < 1 ) (11)

where each parameter is defined as follows:

z k 1 ( x ) : the Hazard-Rate for assumption A1;

D 1 : the initial Hazard-Rate for the first software failure of A1;

c 1 : the decrease coefficient for Hazard-Rate for assumption A1;

z k 2 ( x ) : the Hazard-Rate for assumption A2;

D 2 : the initial Hazard-Rate for the first software failure of A2;

c 2 : the decrease coefficient for Hazard-Rate for assumption A2;

p: the weight parameter for z k 1 ( x ) .

Equation (10) represents the Hazard-Rate for a software failure-occurrence phenomenon for the normal fault. On the other hand, Equation (11) represents the Hazard-Rate for a software failure-occurrence for the other one.

Cox PHM is the model representing Hazard-Rate by using baseline hazard function, which is subject for a variable of time, and covariate vector. In this section, we discuss about Cox PHM.

It is assumed that two kinds of vectors are defined as follows:

α k = ( α k 1 , α k 2 , ⋯ , α k j , ⋯ , α k q ) ( k = 1 , 2 , ⋯ ) , (12)

β = ( β 1 , β 2 , ⋯ , β j , ⋯ , β q ) , (13)

where each vector is defined as follows:

α k : the covariate vector including q kinds of data α k j ( j = 1, ⋯ , q ) for X k ;

β : the coefficient vector for α k .

Therefore, Cox PHM is defined as follows by using two vectors above:

h k ( x , α ) = h 0 ( x k ) exp ( α k T β ) = h 0 ( x k ) exp ( α k 1 β 1 + ⋯ + α k q β q ) (14)

where: h 0 ( x k ) in Equation (14) is called baseline hazard function and is subject for a variable of x k .

As a proposed model, we apply the Hazard-Rate Model CFSL to the baseline hazard function in the Cox PHM. Moreover, we use the assignee data in BTS and Mean Time Between Correction (MTBC) into the covariate vector. Then, our proposed model is derived as follows:

h k ( x , α ) = z k ( x ) exp ( α k T β ) = { p ⋅ z k 1 ( x ) + ( 1 − p ) ⋅ z k 2 ( x ) } exp ( α k T β ) ( k = 1,2, ⋯ ; 0 ≤ p ≤ 1 ) (15)

where each parameter is defined as follows:

z k 1 ( x ) : the Hazard-Rate for assumption A1;

z k 2 ( x ) : the Hazard-Rate for assumption A2;

p: the weight parameter for z k 1 ( x ) ;

α k : the data of assignee and MTBC in OSS;

β : the coefficient parameter for α k .

In this paper, we apply the exponential Hazard-Rate Model to the baseline hazard function. Thus, the proposed model can be regarded as a parametric model. Moreover, the distribution function and the density function of X k are derived as a Equation (16), (17) respectively.

F k ( x ) = 1 − exp ( − ∫ 0 x z k ( x ) exp ( α k T β ) d x ) (16)

f k ( x ) = z k ( x ) exp ( α k T β ) exp ( − ∫ 0 x z k ( x ) exp ( α k T β ) d x ) (17)

For this reason, the parameters in the proposed model can be estimated by MLE (Maximum Likelihood Estimation).

We use of fault big data in Apache HTTP server to estimate MTBF as the evaluation of the performance of our proposed model compared to Hazard-Rate Model CFSL [

PHM1: the data of assignee is only included in α k ;

PHM2: MTBC is only included in α k ;

PHM3: the data of assignee and MTBC are included in α k .

The parameters in the proposed models are estimated by MLE (Maximum Likelihood Estimation). The estimated value of parameters in three models is shown in

In

As a criterion to measure the goodness-of-fit of our proposed model, we use AIC (Akaike’s Information based on the maximum likelihood estimation of model parameters Criterion).

Figures 2-5 show the estimated MTBF for each model and

Value of Parameter | ||||||
---|---|---|---|---|---|---|

Models | w 1 ^ | w 2 ^ | c 1 ^ | c 2 ^ | β 1 ^ | β 2 ^ |

PHM1 | 2.15302 | 1.94836 | 0.99944 | 0.99994 | −0.00088 | - |

PHM2 | 2.11500 | 2.20817 | 0.99993 | 0.99933 | - | −8.18298e−06 |

PHM3 | 2.65362 | 1.89733 | 0.99942 | 0.99995 | 0.00898 | −5.52184e−05 |

Model | AIC |
---|---|

Hazard-Rate Model CFSL | 13,326.5 |

PHM1 | 13,322.8 |

PHM2 | 13,320.6 |

PHM3 | 13,322.2 |

value of AIC for each model. From Figures 2-5, PHM estimates MTBF shorter than Hazard-Rate Model CFSL at the initial faults slightly. In terms of AIC, we find that PHM fits better than the Hazard-Rate Model CFSL from

In this paper, we have proposed the Hazard-Rate Models for OSS including various fault data based on Cox PHM. Specifically, we have made the Hazard-Rate Models considering CFSL adapt to the baseline hazard function. Besides, we have applied the data of assignee and MTBC into the covariate vectors in Cox PHM. Also, we have shown numerical examples to evaluate the performance of our model. As the result, we have shown that the proposed model predicts MTBF, and fits better than the Hazard-Rate model considering CFSL in terms of AIC.

OSS is popular and in demand for a lot of organizations in various situations. However, OSS is developed by many volunteers in the world without an explicit testing phase. Therefore, the reliability of OSS is not ensured. For this reason, it is necessary to measure software reliability quantitatively. There are various fault data in the BTS of OSS. Then, the data sets are useful to find the characteristics of OSS. Moreover, we can assess software reliability accurately by using not only the data of the time of occurrence of software failures in the testing or operation phase but also the other various fault data in BTS.

In BTS, there are many kinds of fault data aside from the one we used in this paper. Therefore, we will discuss the proposal of other software reliability models with other kinds of fault data in BTS as future research. Also, we would like to suggest new measurements for OSS reliability including the characteristics of OSS.

This work was supported in part by the JSPS KAKENHI Grant No. 20K11799 in Japan.

The authors confirm that there is no conflict of interest to declare for this publication.

Yanagisawa, T., Tamura, Y., Anand, A. and Yamada, S. (2022) A Software Reliability Model for OSS Including Various Fault Data Based on Proportional Hazard-Rate Model. American Journal of Operations Research, 12, 1-10. https://doi.org/10.4236/ajor.2022.121001