Lognormal Process Software Reliability Modeling with Testing-Effort

We propose a software reliability growth model with testing-effort based on a continuous-state space stochastic process, such as a lognormal process, and conduct its goodness-of-fit evaluation. We also discuss a parameter estimation method of our model. Then, we derive several software reliability assessment measures by the probability distribution of its solution process, and compare our model with existing continuous-state space software reliability growth models in terms of the mean square error and the Akaike’s information criterion by using actual fault count data.


Introduction
Quantitative software reliability assessment is one of the important activities to produce reliable software systems.A software reliability growth model (abbreviated as SRGM) [1,2] is known as a useful mathematical tool for assessing software reliability quantitatively.The SRGM can describe the software fault-detection phenomenon or the software failure-occurrence phenomenon in the testing or operational phase by applying stochastic and statistical theories.Especially, a nonhomogeneous Poisson process (abbreviated as NHPP) [3,4], which describes the faultdetection phenomenon as a discrete-state space, is often applied in software reliability growth modeling by supposing an appropriate mean value function of the NHPP.Accordingly, the NHPP model is utilized for quantitative software reliability assessment in many software houses and computer manufacturers from the view point of the high applicability and simplicity of the model structure of an NHPP.
In contrast with discrete-state space SRGMs, such as well-known NHPP models [5], there are continuousstate space SRGMs to assess software reliability for large scale software systems.Specifically, Yamada et al. [6] discussed a framework for the continuous-state space software reliability growth modeling based on a stochastic differential equation of Itô type, and compared the continuous-state space SRGM with the NHPP models.Recently, Yamada et al. [7] and Lee et al. [8] proposed several type of the continuous-state space SRGMs based on stochastic differential equations of Itô type, such as exponential, delayed S-shaped, inflection S-shaped stochastic differential equation models, by characterizing the fault-detection rate per unit time per one fault, respectively.
However, these continuous-state space SRGMs have not taken the effect of testing-effort into consideration in software reliability assessment.The testing-effort, such as the number of executed test-cases, testing-coverage, and CPU hours expended in the testing phase, is well known as one of the most important factors being related to the software reliability growth process [9].Under the above background, there is necessity to discuss a testingeffort dependent SRGM on a continuous-state space for the purpose of developing a more plausible continuous-state space SRGM.
This paper proposes a continuous-state space software reliability growth model with the testing-effort factor by applying a mathematical technique of stochastic differrential equations of Itô type, and conducts goodnessof-fit comparisons of our model with existing continuous-state space SRGMs.Concretely, we extend a basic differential equation describing the behavior of the cumulative number of detected faults to a stochastic differ-ential equation of Itô type by considering with the testing-effort factor, and derive its solution process which represents the fault-detection process.Then, we discuss estimation methods for unknown parameters in our model.And we then compare our model with existing continuous-state space SRGMs in terms of the mean square error (MSE) [5] and Akaike's information criterion (AIC) [10].Finally, we derive several software reliability assessment measures by the probability distribution of the solution process, and show numerical examples for software reliability assessment measures derived from our model by using actual fault counting data.

Basic Modeling Framework
We discuss a framework of continuous-state space software reliability growth modeling [6].Letting

 
N t be a random variable which represents the number of faults detected up to time t.From the common assumptions for software reliability growth modeling, that is, the instantaneous number of detected faults at time t depends on the residual number of undetected faults at time t [1,2,5], we have the following linear differential equation: where b(t) indicates the fault-detection rate at testing-time t and is assumed to be a non-negative function, and a the initial fault content in the software system.Equation (1) describes the behavior of the decrement of the fault content in the software system.Especially, in the large-scale software development, a fault-detection process in an actual testing phase is influenced by several uncertain testing-factors, such as testing-skill, debugging-environment.Accordingly, we should take these factors into consideration in software reliability growth modeling.Therefore, we extend Equation (1) to the following equation: where   t  is a noise that exhibits an irregular fluctuation.For the purpose of making its solution a Markov process, we assume that   where  indicates a positive constant representing magnitude of the irregular fluctuation and  a standardized Gaussian white noise.We transform Equation (2) into the following stochastic differential equation of Itô type: where W(t) is a one-dimensional Wiener process which is formally defined as an integration of the white noise with respect to time t.The Wiener process W(t) is called a Gaussian process, and has the following properties: where Pr[A] and E[•] represent the probability of the event A and the expectation, respectively.Next, we derive a solution process N(t) by using the Itô's formula.
The solution process N(t) can be derived as Equation ( 5) implies that the solution process N(t) obeys a geometric Brownian motion or a lognormal process [11].And the transition probability distribution of the solution process N(t) is derived as consequently, by the properties (a)-(c) and the assumption that W(t) is a Gaussian process. in Equation ( 6) indicates a standard normal distribution defined as By giving an appropriate function b(t) in Equation ( 5), we can derive several SRGM's.Yamada et al. [7] and Lee et al. [8] proposed several lognormal process models, in which the fault-detection rates b(t) follow the basic modeling assumptions of the well-known NHPP model, such as the delayed S-shaped [12] and inflection Sshaped [13] SRGMs, based on the modeling framework [6] mentioned in this section

Lognormal Process SRGM with
Testing-Effort We develop a continuous-state space SRGM with the effect of testing-effort factor based on stochastic differential equations which follow the lognormal process.The testing-effort, such as the number of executed test-cases, testing-coverage, and CPU hours, is one of the important factors influencing on the software reliability growth process in an actual testing-phase.Therefore, the testing-effort should be taken into consideration in continuous-state space software reliability growth modeling.

Modeling
For developing a continuous-state space SRGM with the effect of the testing-effort factor, we characterize b(t) in Equation ( 5) as follows: where r represents the fault-detection rate per expended testing-effort at testing time t and s(t)(  dS(t)/dt) is the amount of the testing-effort expended at arbitrary testing time t.In Equation ( 8), we assume that the fault-detection rate at testing-time t depends on the instantaneous testing-effort expenditures [9].That means, the testing-team can detect or remove more software faults when the software development manager decides to expend more testing-effort to detect or remove software fault.Based on the framework of continuous-state space software reliability growth modeling [6], we can obtain the following solution process: The transition probability distribution function of the solution process in Equation ( 9) can be derived as We should specify the testing-effort function s(t) in Equation ( 8) to utilize the solution process   T N t in Equation ( 9) as an SRGM.

Testing-Effort Function
We need to specify a suitable function for the s(t) in Equation (8).In this paper we describe the time-dependent behavior of testing-effort expenditures in the testing-phase by using a Weibull curve function, that is, Then, we have where  is the total amount of testing-effort expenditures,  the scale parameter, and m the shape parameter characterizing the shape of the testing-effort function.
The Weibull curve function has a useful property to describe the time-dependent behavior of the expended testing-effort expenditures in the testing-phase approximately because of its flexibility.For examples, we can obtain the exponential curves when m = 1 in Equations ( 11) and (12).And when m = 2, we can derive Rayleigh curves.That is, the Weibull curve function is a general type model, which includes the exponential and Rayleigh curve functions.

Parameter Estimation
We discuss estimation methods of unknown parameters of the testing-effort function in Equation ( 11) and the solution process in Equation ( 9), respectively.Now we suppose that K data pairs   , , with respect to the total number of faults, j n detected during the time-interval  0, , j t   and the amount of testing-effort expenditures, j y , expended at j t are observed.

For the Testing-Effort Function
Regarding a parameter estimation method for the testing-effort function in Equation ( 11), we use a method of least squares.First we can obtain the following equation by taking the natural logarithm of Equation ( 11): Then, the sum of the squares of vertical distances from the data points to the estimated values is derived as S   m in Equation ( 14) can be obtained by solving the following simultaneous equations:

For the Solution Process
Next we discuss a parameter estimation method for the solution process in Equation ( 9) by using the method of maximum-likelihood.Let us denote the joint probability distribution function of the process as and also denote its density as , ; , ; ; , , ; , ; ; , .
For convenience in the mathematical manipulations, we use the following logarithmic likelihood function: The likelihood function l in Equation ( 18) can be written as the following equation by using the Bayes' formula and a Markov property: where is the conditional probability density under the condition of 0 .
T The transition density 1 1 j in Equation ( 20) can be obtained by partially differentiating the following transition probability of under the condition with respect to .j n Consequently, the likelihood function l in Equation (20) can be rewritten as and the logarithmic likelihood function is derived as Then, we can obtain the maximum likelihood estimations and ˆ, , a r  for the parameters and , , a r  in Equation ( 9) by solving the following simultaneous likelihood equations numerically:

Software Reliability Assessment Measures
We derive several software reliability assessment measures, which are useful for quantitative assessment of software reliability.Especially, we derive instantaneous and cumulative MTBF's in this paper.

Instantaneous MTBF
We discuss an instantaneous MTBF (mean time between software failures or fault-detections) which has been used as one of the substitution for the usual MTBF.An instantaneous MTBF is approximately derived by We need to derive  , which represents the expected number of faults detected up to arbitrary testing time t, to obtain  in Equation (25).By the Wiener process W(t) ~ N(0,t), the expected number of faults detected up to arbitrary testing time t is obtained as Since the Wiener process has the independent increment property W(t) and dW(t) are statistically independent with each other, and E[dW(t)] = 0, The instantaneous MTBF in Equation ( 25) can be calculated by substituting Equation (27) into Equation (25).

Cumulative MTBF
A cumulative MTBF is also the substitution for the usual MTBF.The cumulative MTBF is approximately derived as If the instantaneous MTBF in Equation (25) and the cumulative MTBF in Equation (28) take on a large value, respectively, then it enables us to decide that the software system becomes more reliable.

Model Comparisons
We show results of goodness-of-fit comparisons of our model with other continuous-state space SRGMs [6][7][8], such as exponential, delayed S-shaped, and inflection S-shaped stochastic differential equations, in terms of the mean square errors (MSE) [5] and Akaike's Information Criterion (AIC) [10].Regarding the goodness-of-fit comparisons, we use two actual data sets [14] named as DS1 and DS2, respectively.DS1 and DS2 indicate an Sshaped and exponential reliability growth curves, respectively.
The MSE [5] where k denotes the estimated value of the expected cumulative number of faults by arbitrary testing time k .Accordingly, the model which indicates the smallest MSE fits best to the observed data set than other models.
The AIC [10] is known as a goodness-of-fit evaluation criterion considering the number of model parameter.The AIC is given by   AIC 2 the logarithmic maximum-likelihood 2 the number of free model-parameters We should note that the AIC values themselves are not significant.The absolute value of difference among their values is significant.We can judge that the model indicating the smallest AIC fits best to the actual data set if their differences are greater than or equal to 1.If the differences are less than 1, there are no significant.
Table 1 shows the results of model comparisons based on the MSE and the AIC, respectively.The model comparisons based on the AIC is not significant only for DS2, however, we can see that our model improves a performance of the MSE and the AIC respectively, compared with other continuous-state space SRGMs discussed in this paper.

Numerical Examples
We show numerical examples by using testing-effort data recorded along with detected fault counting data collected from the actual testing.In this testing, 1301 fault are totally detected and 1846.92(testing hours) are totally expended as the testing-effort within 35 months [14].
Figure 1 shows the estimated testing-effort function  3 shows the time-dependent behavior of the estimated two types of the substitutions of the MTBF, such as instantaneous and cumulative MTBFs in Equations ( 25) and (28), respectively.We should note that getting larger instantaneous or cumulative MTBF indicates growing the software reliability.From Figure 3, we can see that the software reliability decreases in the early testing period because the estimated software fault-detection rate is getting larger in the early testing period due to the model structure.And then, the software reliability grows as the testing procedures go on because the fault-detection rate is getting decrease and the residual fault content is also getting decrease.We note that the time-dependent behaviour of the instantaneous and cumulative MTBFs are different each other due to approximation methods.The instantaneous MTBF responds sensitively to the number of software faults detected at testing-time t because the instantaneous MTBF does not   , to be about 0.0269 (about 0.9 months).

Concluding Remarks
We have discussed a continuous-state space SRGM with the effect of testing-effort by using a mathematical technique of stochastic differential equations and its parameters estimation methods.Then, we have compared performance in software reliability measurement of our model with existing continuous-state space SRGMs in terms of the MSE and the AIC by using actual data, respectively.Finally, we have also shown numerical illustrations for the software reliability assessment measures, such as the instantaneous and cumulative MTBFs.
We believe that software developing managers can get information on a relationship between the attained software reliability and the testing-effort expenditures by using our software reliability growth model.And our model also enables software development managers to decide how much testing-effort are expended to attain a reliability objective.Further studies are needed to examine the validity of our model for practical applications by using many observed data.

Figure 2 .
Figure 2.Estimated expected number of detected faults.

Figure 3 .
Figure 3.Estimated instantaneous and cumulative MTBFs.incorporate information of the past software reliability growth process as shown in Equation (25).We can estimate the instantaneous MTBF at the termination time of the testing, , to be about 0.1297 (about 4.5 months), and also, the cumulative MTBF,   MTBF 35 I is obtained by dividing the sum of squared errors between the observed and estimated cumulative numbers of detected faults, k and y