Comparison of Hazard-Rates Considering Fault Severity Levels and Imperfect Debugging for OSS

Software reliability model is the tool to measure the software reliability quantitatively. Hazard-Rate model is one of the most popular ones. The purpose of our research is to propose the hazard-rate model considering fault level for Open Source Software (OSS). Moreover, we aim to adapt our proposed model to the hazard-rate considering the imperfect debugging environment. We have analyzed the trend of fault severity level by using fault data in Bug Tracking System (BTS) and proposed our model based on the result of analysis. Also, we have shown the numerical example for evaluating the performance of our proposed model. Furthermore, we have extended our proposed model to the hazard-rate considering the imperfect debugging environment and showed numerical example for evaluating the possibility of application. As the result, we found out that performance of our proposed model is better than typical hazard-rate models. Also, we verified the possibility of application of proposed model to hazard-rate model considering imperfect debugging.


Introduction
Open source software (OSS) is freely available for use, reuse, fixing, and re-distribution by users and developers. OSS is used under various situations because OSS is useful for many users to make cost reduction, standardization, and quick delivery. However, the quality of OSS is not good because of the unique development style. The quality of OSS is very important, which depends on the demand of users in the future. The faults latent in OSS are fixed by using database of bug tracking system (BTS). There is various information in terms of the faults recorded in BTS and the model to analyze big fault data in BTS based on deep learning has been proposed as a research [1]. Software reliability is one of the software characteristics factors in order to evaluate the quality of software. A software reliability model is the tool to measure the software reliability quantitatively and many various software reliability models have been proposed by many researchers [2]- [8]. Also, the software effort model based on the software reliability model has been proposed so far [9]. In particular, a hazard-rate model is one of the most popular ones [10] [11].
A lot of hazard-rate models have been proposed so far. However, the hazard-rate model based on fault levels has not been proposed as of today. The purpose of our research is to propose the hazard-rate model considering fault level for OSS. In this paper, we assume that there are different trends on each fault severity level in terms of mean time between software failures (MTBF).
Based on the assumption, we analyze the fault big data in BTS and find the difference of trend on each fault severity level in terms of MTBF and we propose a hazard-rate model considering fault severity level for OSS from the result of analysis.
Moreover, we aim to adapt our proposed model to the hazard-rate considering the imperfect debugging environment. Most of the software reliability models are assumed that all detected faults in the software are fixed and removed perfectly and new faults are not introduced at the time when the fault is fixed and removed. However, that assumption is not practical one in actual situation.
In other words, we assume that the testing phase and operating phase in software development are in imperfect debugging environment. There are some researches about debugging such as effectiveness of statistical debugging [12]. Also, the software reliability models considering the imperfect debugging environment have been proposed in the past. In this paper, we adapt our proposed model to the hazard-rate model considering the imperfect debugging environment. Then, we show several numerical examples based on the proposed model.

Bug Tracking System
The faults in OSS are fixed by using BTS. There are many information related to recorded faults in OSS, e.g., the recorded time of fault, the severity of fault and so on. As the severity of fault in BTS, we show the software fault severity levels [13] in Table 1. 7 kinds of levels in Table 1 are the fault levels of the severity in BTS.

Software Reliability Model
A software reliability model is the tool to measure the software reliability quan-titatively and most of the models are proposed by the probability and statistical theory. The software reliability model is categorized according to analytical model and empirical model. Moreover, the empirical model is categorized into dynamic model and static model. Especially, the dynamic model is presenting the fault discovery event and software failure occurrence event in test phase or operation phase as a process of software reliability growth, which is described as a stochastic model i.e., so-called software reliability growth model (SRGM). In this paper, we use the hazard-rate model which is one of the most popular ones in SRGM.

Hazard-Rate Model
In this section, we discuss the hazard-rate model. Firstly, we can express the probability related to the number of software faults and the time of occurrence of software failures in testing phase or operating phase as shown in Figure 1. The fault is the most serious in all of fault levels.

Critical
This fault is more serious fault comparably but is less serious than blocker.

Major
This level shows that most of certain functions in software are malfunction.

Normal
This fault level is general one.

Minor
The fault happens in minor functions of software.

Trivial
The fault is minor one that has less effectiveness to the function in software.

Enhancement
This level is not fault itself generally but is requested to be revised.
where Pr{A} represents the occurrence probability of event A. Therefore, the following derived function means the probability density function of k X : Also, the software reliability can be defined as the probability that a software failure does not occur during the time-interval ( ] 0, x . The software reliability is given by From Equations (1)-(3), the hazard-rate is given by the following equation: where the hazard-rate means the software failure rate when the software failure does not occur during the time-interval ( ] 0, x . A hazard-rate model is an SRGM representing the software failure-occurrence phenomenon by the hazard-rate. Moreover, we discuss three hazard-rate model as follows:

Jelinski-Moranda Model
Jelinski-Moranda (J-M) model is one of the hazard-rate models. J-M model has the following assumptions: 1) The software failure rate during a failure interval is constant and is proportional to the number of faults remaining in the software.
2) The number of remaining faults in the software decreases by one each time a software failure occurs.
3) Any fault that remains in the software has the same probability of causing a software failure at any time.
From the above assumptions, the software hazard-rate in Equations (4) at k th can be derived as where each parameter is defined as follows: N: the number of latent software faults before the testing, φ : the hazard-rate per inherent fault.

Moranda Model
Moranda model has the following assumptions: 1. The software failure rate per software fault is constant and is decreasing geometrically as a fault is discovered.
From the above assumptions, the software hazard-rate in Equations (4) at k th Journal of Software Engineering and Applications can be derived as where each parameter is defined as follows: D: the initial hazard-rate for the software failure, c: the decrease coefficient for hazard-rate.

Xie Model
Xie model has the following assumptions: 1. The software failure rate per software fault is constant and is decreasing exponentially with the number of faults remaining in the software.
From the above assumptions, the software hazard-rate in Equations (4) at k th can be derived as where each parameter is defined as follows: N: the number of latent software faults before the testing, 0 λ : the hazard-rate per inherent fault, α : the constant parameter.

MTBF
Three hazard-rate models above have the following assumption: • Any fault that remains in the software had the same probability of causing s software failure at any time.
From the above assumption, MTBF by three hazard-rate models can be derived as

Observation and Analysis of Trend on Each Fault Level
We analyze the fault big data from the perspective of MTBF in Apache HTTP Server (The Apache Software Foundation) known as the OSS developed under Apache Software Foundation [14]. Especially, we use the data in terms of the fault severity. In this paper, we use 7 kinds of fault levels in severity as shown in the following items: Figure 2 shows the estimation results of MTBF in each fault severity level. Table 2 shows the estimated variance in each fault severity level. In terms of Journal of Software Engineering and Applications  variance, we find that the value of normal fault is the smallest in all of fault severity levels. In other words, the normal faults occur at a constant frequency, while other fault severity levels occur less as time goes. From the results of analysis, we assume that the fault data is divided into normal fault and others.

Application of Hazard-Rate Model to the Actual Data
In this section, we apply typical hazard-rate models to 2 kinds of data sets which are normal fault and other fault in order to find out which hazard-rate models fit to normal fault and other fault in terms of MTBF. We apply the following 3 models to actual fault data. We use AIC (Akaike's Information based on the maximum likelihood estimation of model parameters Criterion) to measure the goodness-of-fit of these models to actual data. The result of AIC is shown in Table 3. Figures 3-8 show the estimated MTBF of each model for both normal fault and other fault data, respectively. From Table 3, we find that the value of AIC in Moranda model is the smallest in both normal fault and others fault.

Proposed Model
From the results of Section 2, we assume that fault data is divided into the following types: A1. The normal fault A2. The others fault In the assumption above, A1 is the fault detected as a normal one, A2 is the fault detected as other one. Also, OSS manager cannot differentiate between assumptions A1 and A2 in terms of the software faults. The time interval between successive faults of (k − 1) th and k th is represented as the random variable Equation (10) represents the hazard-rate for a software failure-occurrence phenomenon for the normal fault, On the other hand, Equation (11) represents the hazard-rate for a software failure-occurrence for the other one. Also, we show the diagram to describe the algorithm of proposed method in Figure 9.

Numerical Example
In order to evaluate the performance of the proposed model, we estimate the MTBF of fault big data in Apache HTTP server. The parameters of proposed integrated model have been estimated by MLE (Maximum Likelihood Estimation). The estimated value of parameters is shown as follows:      From Figures 10-13, the typical hazard-rate models estimate MTBF higher than actual one and we found out these models estimate MTBF optimistically. On the other hand, the proposed integrated model estimates MTBF realistically. Table 4 shows the value of AIC for each model. From Table 4, the proposed integrated model fits better than the other model in terms of AIC. In other words, we can predict the MTBF of OSS more precisely with the proposed integrated model.

Imperfect Debugging Model
Most of the software reliability models are assumed that all faults found in software are fixed and removed perfectly and new faults is not introduced at the time when the fault is fixed and removed. However, that assumption is not practical one in actual situation. In other words, it insists the testing phase and operation phase in software development is in imperfect debugging environment. The software reliability models considering the imperfect debugging have been proposed [15]. In this paper, we adapt our proposed model to the hazard-rate model considering the imperfect debugging and verify the possibility of application of proposed model to it. We assume that fault data is divided into the following types: A3. The latent fault in software before the release A4. The fault caused by imperfect debugging In the assumption above, A3 is the latent fault before the release of software and A4 is the fault caused at the time when the latent fault is fixed and removed. Also, OSS manager cannot differentiate between assumptions A3 and A4 in terms of the software faults. The time interval between successive faults of (k − 1) th and k th is represented as the random variables ( ) 1, 2, k X k = , Therefore, the hazard-rate function ( ) k z x for k X is defined as follows: where each parameter is defined as follows: ( ) Equation (13) represents the integrated hazard-rate for a software failure-occurrence phenomenon for the latent fault in software before the release, which is our proposed model. On the other hand, Equation (14) represents the hazard-rate for a software failure-occurrence for the fault caused by imperfect debugging. We assume that the fault caused by imperfect debugging is caused randomly. For that reason, we adapt exponential distribution to A4.

Numerical Example
In order to verify the possibility of application of proposed model to hazard-model considering imperfect debugging, we estimate the MTBF of fault big data in Apache HTTP Server as well. The parameters of proposed model considering the imperfect debugging have been estimated by MLE. The estimated value of parameters is shown as follows:   Figure 14 shows the estimated MTBF for the proposed model considering the imperfect debugging. Table 5 shows the value of AIC for each model. Journal of Software Engineering and Applications  From Table 5, the proposed model for imperfect debugging fits better than proposed model in terms of AIC. In other words, it is possible to adapt the proposed model to the hazard-model considering imperfect debugging.

Concluding Remarks
In this paper, we have assumed that there are different trends on each fault severity level in terms of MTBF and analyzed the fault big data in BTS. We have found the difference of trend on each fault severity level in terms of MTBF and we proposed a hazard rate model considering fault severity level for OSS from the result of analysis. Proposed integrated model fits better than the typical hazard-rate models in terms of AIC. Moreover, we adapted our integrated proposed model to the hazard-rate model considering imperfect debugging. The proposed model considering imperfect debugging fits better than proposed integrated model in terms of AIC.
OSS is used by many organizations because of low cost, standardization and quick release. However, the quality of OSS is not good because of the unique development style. The quality of OSS is necessary to depend on the demand of users in the future. At the same time, it is very important to propose the software reliability model for OSS. Especially, software reliability models considering imperfect debugging are very practical for the actual situation in software development.
In the future, it is necessary to verify the applicability of proposed model because the data set in Apache HTTP server is the only one by which we evaluate the goodness-of-fit of proposed model. In this paper, we compared our proposed models to J-M model, Moranda model and Xie model. However, these models are very old ones. There are a lot of software reliability models that have ever proposed so far, therefore, we have to compare our proposed models to other hazard-rate models to evaluate the performance of our proposed models. Also, we consider the proposal of software reliability model for OSS from the other perspective.