Two-Sample Bayesian Predictive Analyses for an Exponential Non-Homogeneous Poisson Process in Software Reliability

The Goel-Okumoto software reliability model is one of the earliest attempts to use a non-homogeneous Poisson process to model failure times observed during software test interval. The model is known as exponential NHPP model as it describes exponential software failure curve. Parameter estimation, model fit and predictive analyses based on one sample have been conducted on the Goel-Okumoto software reliability model. However, predictive analyses based on two samples have not been conducted on the model. In two-sample prediction, the parameters and characteristics of the first sample are used to analyze and to make predictions for the second sample. This helps in saving time and resources during the software development process. This paper presents some results about predictive analyses for the Goel-Okumoto software reliability model based on two samples. We have addressed three issues in two-sample prediction associated closely with software development testing process. Bayesian methods based on non-informative priors have been adopted to develop solutions to these issues. The developed methodologies have been illustrated by two sets of software failure data simulated from the Goel-Okumoto software reliability model.


Introduction
Over the last decade of the 20 th century and the first few years of the 21 st century, the demand for complex soft-ware systems has gone high as it is seen that today, computers are embedded in automotive mechanical and safety control systems, industrial and quality process control, real-time sensor networks, aircrafts, nuclear reactors, hospital healthcare and air traffic control systems among others; computer systems have become an indispensable component of our modern society today.Consequently, the reliability of software used in these systems has been a major concern and a requirement in the modern generation.Software reliability is defined as the probability of failure-free software operations for a specified period of time in a specified environment [1].A single software defect can cause system failure and to avoid these failures, reliable software is required.Software reliability is achieved through testing during the software development testing stage [2].The usual criteria of removing bugs in software are by running test cases in a manner that exercises the software similar to the way that users will operate in their particular environment.However, emulating end-user environment during the test interval is difficult and time-consuming especially when there are multiple types of end-users and also, business pressure to release a software system within a tight market window puts a constraint on the amount of time that can be spent testing the software.Software reliability modeling comes in handy to address this dilemma.As indicated by [3], software reliability modeling can provide the basis for planning reliability growth tests, monitoring progress and estimating current reliability and forecasting and predicting future reliability improvements.Forecasting and prediction are achieved through predictive analyses.In particular, predictive analyses are useful in determining when to terminate the development process of software or hardware.Often, a prediction interval is constructed to provide the time frame when the ( ) future failure observation will occur with a pre-determined confidence level [4].
Many software reliability models have been developed by various authors and researchers in the past three decades.Amongst, an Exponential Nonhomogeneous Poisson Process with intensity function ( ) is the earliest software reliability model to be developed by Goel and Okumoto in 1979.In various literatures, this NHPP is called the Goel-Okumoto (1979) model.As noted by [5], the Goel-Okumoto (1979) model has been applied to a number of software testing environments and its application on assessing and detecting software failures has been investigated by various authors.For instance, the Goel-Okumoto model has been used to develop a statistical control mechanism that could be used to detect whether a software process is statistically under control or not.ML estimation of the parameters of the Goel-Okumoto (1979) model has been conducted and in particular, it has been shown that the ML estimates of the parameters of the model are not consistent as the testing period extends to infinity.[6] presented an empirical method for selecting software reliability growth models for release decision-making where they applied iteratively various software reliability models namely Goel-Okumoto (1979), Delayed S-shaped, Gompertz and Yamada exponential software reliability growth models to weekly cumulative software failure data during system test to determine the number of remaining failures expected in software after release.[7] also performed parameter estimation of the Goel-Okumoto, Yamada S-shaped and Inflection S-shaped software reliability growth models where they also established a necessary and sufficient condition with respect to the software failure data, of which, if satisfied, will ensure that the MLE method returns a unique positive and finite estimation of the unknown parameters of the Goel-Okumoto and the Yamada S-shaped models.[8] presented software failure data which, after study, depicted that the failure rate, i.e. the number of failures per hour, seemed to be decreasing with time, an indication that a Nonhomogeneous Poisson Process with mean value function ( ) ) , a mean value function corresponding to that of the Goel-Okumoto software reliability model, was a reasonable model to describe the failure process.From the literature, it is evident that most of the study that has been done on the Goel-Okumoto software reliability model is parameter estimation using, especially, the MLE method and model fit.There is a conspicuous absence of literature on both the classical and Bayesian predictive analyses on the model.This paper focuses on single-sample predictive inference for the Goel-Okumoto (1979) software reliability model using Bayesian approach.We first identify four issues in the single-sample prediction associated closely with the development testing process of software and proceed to develop and derive the corresponding predictive distributions in Section 2. The main results for single-sample prediction are presented in Section 3. A real example in the form of secondary software failure data in the form of execution times between successive software failures is used to illustrate the proposed and developed methodologies in Section 4. A discussion is given in Section 5 and thereafter, mathematical proofs are given in the Appendix.

Predictive Issues and Bayesian Method
During the development testing stage of a software, statisticians and engineers are overly interested in various predictive problems whose solutions are believed to be very important in modifying, debugging and determining when to terminate software development testing process.In this section, we present four issues associated closely with software development testing process and derive the predictive distributions using Bayesian approach.For the purposes of the four predictive issues, we assume that a reliability growth testing is performed on a software and the cumulative number of failures of the software in the time interval ( ] Prediction interval is a confidence interval for a future observation or a function of some future observations.Specifically, a double-sided (bilateral) prediction interval for n k t + with confidence level γ is defined by , Pr . Similarly, a single-sided (unilateral) lower or upper prediction limit for n k t + with level γ is defined by depends only on a single sample (or a single software) and are called single-sample prediction limits.Prediction limits involving two samples (or two softwares) can be defined similarly and are called two-sample prediction limits.

Issues in Single-Sample Software Reliability Prediction
Here, we consider one software and assume that its cumulative inter-failure times obey the Goel-Okumoto Issue C: suppose that the target value tv λ for the software failure rate is not achieved at time T, how long will it take so that the software failure rate will be attained at tv λ ?
Issue D: what is the upper prediction limit (UPL) of e βτ τ λ αβ = with level γ , τ being a predetermined value greater than T?

Posterior and Predictive Distributions
Case 1: When the shape parameter β is known, we adopt the following non-informative prior distribution of α : ( ) The posterior distribution of α is thus given by ( ) ( ) ( ) ( ) Let y + be the random variable being predicted.Then the posterior predictive distribution of y + is give as Hence the Bayesian UPL of y + with level γ denoted as ( ) Case 2: When the shape parameter β is unknown, we consider the following non-informative joint prior density for α and β (we assume that α and β are independent).
( ) Hence the corresponding joint posterior density is given as where ( ) Similar to Equation ( 5) and Equation ( 6), let yU denote the Bayesian UPL of y with level γ , then )

Main Results for the Prediction Problems
In this section, we address the four single-sample prediction issues raised in Section 2.1 using Bayesian approach.The following propositions are considered as the main results with proofs being given in the Appendix.
In the subsequent results, we use ( )

2
; n χ γ to represent the γ percentage point of the chi-square distribution with n degrees of freedom and we also assume the priors to be Equation (3) and Equation (7).

Proposition 1 (for issue A):
The probability that at most k software failures will occur in the future time period ( ] ) Proposition 2 (for issue B): Suppose that the pre-determined target value tv λ for the failure rate of the software undergoing development testing is not achieved at time T, the probability that the target value tv λ will be achieved at time , ∑ , we can approximate the second part of (13) via MCMC method.

Proposition 3 (for issue C):
For given level γ , the time * τ required to attain tv λ is where τ is the solution to the following equation:

Example
In this section, a real example from the time between failure data given by [9] is used to illustrate the developed methodologies for the single-sample Bayesian predictive analysis.The Table 1 gives the Time Between Failure.
The study has used the cumulative time between failures as failure times 1 2 30 0 t t t < < < <  where 30 n = .These data obey the Goel-Okumoto (1979) software reliability model [10].The MLEs of the parameters of the software reliability model based on the data are ˆ31.698171α = and ˆ0.003962 β = . In the illustration of the developed methodologies, the study has used these MLEs.
1) Suppose that we are interested in the probability k γ that at most k failures will occur in the future time .
Figure 1 shows the graph of the desired probabilities for the case when β is known and when β is un- known.
2) Suppose that the target value is given by 0.03 .Thus the development testing will continue.Suppose we want to predict the probability that the target value tv λ will be achieved at time 277.83 τ = . a) When β is known, say 0.003962 β = , from the first formula in Equation ( 13) we obtain 0 γ = .Thus we can conclude that the target value (failure rate) will not be achieved.b) When β is unknown, from the second formula in Equation ( 13) and Remark 1, we obtain 0.0576 γ = where the Monte Carlo sample size is 1000 L = .

Discussion
Several prediction problems arise during the development of any software especially when the Goel-Okumoto (1979) software reliability model is used to model the failure process.We have used Bayesian approach with non-informative priors to address some of the prediction problems that may arise during software development testing stage.We have obtained explicit solutions to these problems, which may prove useful for the modification, debugging and for the decision to terminate the development testing process of the software.The adoption of Bayesian approach for the derivation of the solutions is advantageous in that the approach is available for cases of small sample sizes [11] [12].Another advantage of the Bayesian approach is that it allows the input of prior information about the reliability growth process and provides full posterior and predictive distributions.
In this paper, we have used non-informative priors to derive the methodologies to address the said prediction problems.However, informative priors can similarly be used in place of non-informative priors.The same procedures presented in this paper can also be applied to other NHPPs such as the delayed S-shaped process and the Cox-Lewis process.

Appendix (Proofs of Propositions 1 -4)
In order to prove the propositions, we first give an identity without proof.The identity is where m is any positive integer, a and b are two real numbers a b < ,

Proof of Proposition 1:
The probability that at most k failures will occur in the interval ( ) where ( ) α is given by Equation ( , From Equation (2) we have ( ) Equation (A.7) implies the second formula of Equation (12).Consequently, the posterior density of τ λ is ( ) ( )

Proof of
We note that τ λ from Equation (A.9) follows a gamma distribution with parameters n and 1 e .e From Equation ( 8), the joint posterior density of ( )   where τ satisfies Equation (A.8).When β is known, from Equation (A.9), it can easily be seen that The formula in Equation ( 18) can be obtained by similar arguments.
are inter- ested in the following problems:Issue A: what is the probability that at most k software failures will occur in the future time period ( ] Issue B: suppose that the pre-determined target value tv λ for the failure rate of the software undergoing de- velopment testing is not achieved at time T, what is the probability that the target value tv λ will be achieved at time , T τ τ > ? β is unknown, from the second formula in Equation (12) we obtain, 0

Figure 1 .
Figure 1.Comparison of the probabilities γ k that at most k failures will occur in the time interval (180, 240] for the cases of known and unknown β.

3 ).
Since the target value tv λ was not achieved at time 182.21T = , we want to know how long it will require in order to attain tv λ .a) When β is known (i.e.In other words, it will take another 268.6116h in order to achieve the desired failure rate.b) When β is unknown, from Equation (15) and Equation (16), we obtain * 770.79 h τ = .In other words, it will take another 770.79 h in order to achieve the desired failure rate when β is unknown.4) Given 900 h τ = , a) when β is known, from Equation (17), the Bayesian Upper Prediction Limit of e βτ τ λ αβ − = with level 0.90 is given by ( ) ( ) 0.0051.u β λ τ = b) When β is unknown, from Equation (18) and Equation (19), the Bayesian UPL of e βτ τ λ αβ − = with level 0.90 is given by ( ) ( ) 0.131952 u β λ τ = .

11 )
From Equation (A.8), Equation (A.10) and Equation (A.11) we obtain .12) implies the second formula of Equation (13).Proof of Proposition 3:For given level γ , the time required to attain the target value tv λ is *

Proof of Proposition 4 :
-square distribution with 2n degrees of freedom.Therefore, we have 14) follows immediately.We can obtain (ii) by following similar arguments given in the proof for the second part of Proposition 2. For a pre-determined τ ( ) T τ > , the Bayesian Upper Prediction Limit (UPL) for τ λ with level γ is ( ) ( ) .15) is the exact formula in Equation (17).

Table 1 .
Time between failures data.