^{1}

^{1}

^{1}

This paper discusses the Bayesian approach to estimation and prediction of the reliability of software systems during the testing process. A Non-Homogeneous Poisson Process (NHPP) arising from the Musa-Okumoto (1984) software reliability model is proposed for the software failures. The Musa-Okumoto NHPP reliability model consists of two components—the execution time component and the calendar time component, and is a popular model in software reliability analysis. The predictive analyses of software reliability model are of great importance for modifying, debugging and determining when to terminate so ftware development testing process. However, Bayesian and Classical predictive analyses on the Musa-Okumoto (1984) NHPP model is missing on the literature. This paper addresses four software reliability issues in single-sample prediction associated closely with development testing program. Bayesian approach based on non-informative prior was adopted to develop explicit solutions to these problems. Examples based on both real and simulated data are presented to illustrate the developed theoretical prediction results.

Software has become a driver for everything in the 21^{st} century from elementary education to genetic engineering. Thus due to high dependency, the size and complexity of computer systems have grown and these pose a great problem in their reliability as failures are prone to happen during their operations. To avoid the failures and faults, reliability of software needs to be studied during development of software so as to come up with reliable software. Reliability of software is of a lot of concern to the developers.

Software reliability is defined as the probability of failure free software operations for a specified period of time in a specified environment [

Over the past decades many software reliability models that can be used for predictive analyses have been proposed by different authors [

λ ( t ) = α β 1 + β t (1)

The model is based on the assumptions that failures are observed during execution time caused by remaining faults in the software; whenever a failure is observed, an instantaneous effort is made to find what caused the failure and the faults are removed prior to future tests and whenever a repair is done it reduces the number of future faults not like other models. The model must remain stable during the entire testing period for any particular testing environment and a reasonably accurate prediction of reliability must be provided by the model. These are the two main aspects of a good reliability model [

There has been a lot of application of Musa-Okumoto software reliability growth model as it one of the best predictive models, it belongs to the selected models in the AIAA recommended practice standard on software reliability [

Bayesian method owes its name to the fundamental role of Bayes’ theorem. In Bayesian reasoning, uncertainty is attributed not only to data but also to the parameters. Therefore, all parameters are modelled by distributions. Before any data are obtained, the knowledge about the parameters of a problem are expressed in the prior distribution of the parameters. Given actual data, the prior distribution and the data are combined into the posterior distribution of the parameters. The posterior distribution summarizes our knowledge about the parameters after observing the data.

In this paper we assume that a reliability growth testing is performed on a computer software system and the number of failures in the time interval ( 0 , t ] , denoted by N ( t ) is observed. We also assume that { N ( t ) , t > 0 } follows the NHPP with intensity given in Equation (1). Let 0 < t 1 < t 2 < ⋯ be the successive failure times. When testing stops after a pre-determined n number of failures is observed, the failure data is said to be failure-truncated. We denote the n failures time by Y o b s f = [ t i ] i = 1 n where 0 < t 1 < t 2 < ⋯ < t n , a time-truncated data is when testing is observed for fixed time t. We denote the corresponding observed data by Y o b s t = { n , t 1 , ⋯ , t n ; t } , where 0 < t 1 < ⋯ < t n ≤ t .

In this paper we present four issues 1) 2) 3) and 4) as listed below in single-sample prediction which are associated closely with development testing program of a software. Here, we consider one software and assume that its cumulative time between failure times obey Musa-Okumoto software reliability growth model with observed data as either Y o b s f or Y o b s t . Based on Y o b s f or Y o b s t , we are interested in the following problems:

1) What is the probability that at most k software failure will occur in the future time period ( T , τ ] with τ > T ?

2) Given that the pre-determined target value λ t v for the failure rate of the software undergoing development testing is not achieved at time T, what is the probability that the target value λ t v will be achieved at time τ , τ > T ?

3) Suppose that the target value λ t v for the software failure rate is not achieved at time T, how long will it take so that the software failure rate will be attained at λ t v ?

4) What is the upper prediction limit (UPL) of λ τ = α β / ( 1 + β τ ) with level γ . τ being a pre-determined value greater than T?

Let Y o b s represent Y o b s f or Y o b s t . The joint density of Y o b s is therefore :

f ( Y o b s / α , β ) = ( α β ) n ∏ i = 1 n ( 1 + β t i ) − 1 e ( − α ln ( 1 + β T ) ) , α > 0 , β > 0 (2)

Case 1: β , the shape parameter is known, we adopt the following non-informative prior distribution for α :

π ( α ) ∝ 1 α , α > 0 (3)

The posterior distribution of α is thus given by;

h ( α / Y o b s ) = [ Γ ( n ) ] − 1 α n − 1 ( ln ( 1 + β T ) ) n e ( − α ln ( 1 + β T ) ) (4)

Let y ˜ be the random variable being predicted. The predictive density of y ˜ is;

f ( y ˜ / Y o b s ) = ∫ 0 ∞ f ( y ˜ / Y o b s ) h ( α / Y o b s ) d α (5)

Hence, the Bayesian UPL of y ˜ with level γ , denoted as y U ( β ) , must satisfy

γ = ∫ − ∞ y U ( β ) f ( y / Y o b s ) d y (6)

Case 2: The shape parameter β is unknown; we consider the following joint prior distribution of α and β where both parameters are assumed to be independent.

π ( α , β ) ∝ 1 α β , α , β > 0 (7)

Thus the corresponding joint posterior distribution for α and β is given as;

h ( α , β / Y o b s ) = [ k Γ ( n ) ] − 1 α n − 1 β n − 1 ∏ i = 1 n ( 1 + β t i ) − 1 e ( − α ln ( 1 + β T ) ) (8)

Equation (8) is similar to Equation (4), let y ˜ be the random variable predicted. The predictive density of y ˜ is;

f ( y ˜ / Y o b s ) = ∫ 0 ∞ ∫ 0 ∞ f ( y ˜ / Y o b s , α , β ) h ( α , β / Y o b s ) d α d β (9)

and the Bayesian UPL denoted by y U of y ˜ with level γ similar to Equation (6) is;

γ = ∫ − ∞ y U f ( y ˜ / Y o b s ) d y (10)

In this section we address the four issues stated in Section 2.1 using the Bayesian approach. The main results are presented as propositions and their proof given in the Appendix. Below, we use χ 2 ( n ; γ ) to represent γ the percentage point of the chi-square distribution with n degrees of freedom such that Pr { χ 2 ( n ) ≤ χ 2 ( n ; γ ) } = γ , and define Poisson ( h / θ ) = θ h e − θ / h ! and gamma ( x / n , λ ) = λ n x n − 1 e − λ n / Γ ( n ) . The prior is assumed to be Equation (3) and Equation (7) in all subsequent propositions.

Preposition 1 (issue 1)

The probability that at most k failures will occur in the time interval ( T , τ ] with τ > T is

γ k = { [ ln ( 1 + β T ) ] n [ ln ( 1 + β τ 1 + β T ) ] n ∑ j = n n + k ( j − n n − 1 ) [ ln ( 1 + β τ 1 + β T ) ] j [ ln ( 1 + β τ ) ] j if β is known ∑ j = n n + k Γ ( j ) d ( j − n ) ! Γ ( n ) ∫ 0 ∞ β n − 1 ∏ i = 1 n ( 1 + β t i ) − 1 [ ln ( 1 + β τ ) ] j [ ln ( 1 + β τ 1 + β T ) ] j − n d β if β is unknown (11)

Preposition 2 (issue 2)

The probability that the target value λ t v will be achieved at time τ ( τ > T ) is

γ k = { 1 − ∑ h = 0 n − 1 [ ( 1 + β τ β ) λ τ ln ( 1 + β T ) ] h h ! e − λ τ ( 1 + β τ β ) ln ( 1 + β T ) if β is known 1 − 1 k ∑ h = 0 n − 1 ∫ 0 ∞ [ ( 1 + β τ β ) λ t v ln ( 1 + β T ) ] h h ! β n − 1 ∏ i = 1 n ( 1 + β t i ) − 1 [ ln ( 1 + β T ) ] n e − λ t v ( 1 + β τ β ) ln ( 1 + β T ) d β if β is unknown (12)

Preposition 3 (issue 3)

For a given level γ , the time τ ∗ required to attain λ t v is

τ ∗ = { [ χ 2 ( 2 n ; γ ) 2 λ t v ln ( 1 + β T ) − 1 β ] − T if β is known τ − T if β is unknown (13)

Remark 1: For the second part of Equation (13), τ is the solution to the equation

γ = 1 − 1 k ∑ h = 0 n − 1 ∫ 0 ∞ [ λ τ 1 + β τ β ln ( 1 + β T ) ] h h ! e − λ τ 1 + β τ β ln ( 1 + β T ) ⋅ β n − 1 ∏ i = 1 n ( 1 + β t i ) − 1 [ ln ( 1 + β T ) ] n d β . (14)

Preposition 4 (issue 4)

The Bayesian UPL of λ τ = α β 1 + β τ with level γ is

λ U ( β ) ( τ ) = { β χ 2 ( 2 n ; γ ) 2 ( 1 + β τ ) ln ( 1 + β T ) if β is known λ t v if β is unknown (15)

Remark 2: For the second part of Equation (15), λ t v is the solution to

γ = 1 − 1 k ∑ h = 0 n − 1 ∫ 0 ∞ [ λ t v 1 + β τ β ln ( 1 + β T ) ] h h ! e − λ t v 1 + β τ β ln ( 1 + β T ) ⋅ β n − 1 ∏ i = 1 n ( 1 + β t i ) − 1 [ ln ( 1 + β T ) ] n d β (16)

We have used the time between failures data described in [

1) Suppose we are interested in the probability γ k that at most k failures will occur in a future time period ( T , τ ] = ( 180 , 250 ] . a) For the case β known, we take its maximum likelihood estimate as its true value, i.e. β = 0.008282448 . Using the first formula in Equation (11), we have γ 0 = 0.00204337 , γ 1 = 0.01347748 , γ 2 = 0.04653484 , γ 3 = 0.11230530 , γ 4 = 0.21351423 , γ 5 = 0.34188371 , γ 6 = 0.48155675 , γ 7 = 0.61554018 , γ 8 = 0.73112395 , γ 9 = 0.82215131 , γ 10 = 0.88836847 , γ 11 = 0.93328146 , γ 12 = 0.96190403 , γ 13 = 0.97915241 , γ 14 = 0.98903392 , γ 15 = 0.99444044 . b) When β is unknown, from the second formula of Equation (33), we obtain γ 0 = 0.002423218 , γ 1 = 0.015348190 , γ 2 = 0.052195351 , γ 3 = 0.122789747 , γ 4 = 0.229662151 , γ 5 = 0.362653362 , γ 5 = 0.362653362 , γ 7 = 0.639743372 , γ 8 = 0.750133795 , γ 9 = 0.836483617 , γ 10 = 0.897655409 , γ 11 = 0.941055333 , γ 12 = 0.963420189 , γ 13 = 0.981411270 , γ 13 = 0.981411270 , γ 15 = 0.991582449

From the graph it can be seen that there is high probability that at most 15 failures will occur during that time interval when β is unknown as compared to when it is known. 2) Suppose the target value is given by λ t v = 0.03 chosen arbitrarily. At the time T = 182.21 , the MLE of the achieved failure rate for this software is λ ^ ( 182.21 ) = α ^ β ^ / ( 1 + 182.21 β ^ ) = 0 .05045615 , which is greater than λ t v thus it cannot be achieved at time T = 182.21 and development testing will continue. Suppose we want to find the probability that the target value λ t v will be achieved at the time τ = 277.83 h . a) When β is known (say, β = 0.008282448 ), from the first formula in Equation (12), we obtain

γ = 1 .687506e − 06 , which is very small and hence the target valuewill not be achieved. b) when β is unknown, from the second formula in Equation (12) we have γ = 0 .193896 computed by the Monte Carlo Method of integration based on a sample of size L = 1000 . This shows that, when β is unknown there is a possibility of achieving the target value at time τ = 277.83 h .

3) Since the target value λ t v was not achieved at T = 182.21 , we want to know how long it will take for the target value to be achieved. a) when β is known (say, β = 0.008282448 ), let γ = 0.90 , from the first formula in Equation (13) we obtain τ ∗ = 538 .7523 h . This means that, it will take another 538.7523 hours in order to achieve the desired failure rate. b) when β is unknown, from second formula in Equation (13) and Remark1, we obtain τ ∗ = 414 h . Thus, it takes another 414 hours in order to achieve the desired failure rate when β is unknown this shows a high reduction in time as compared to when β is known. 4) Given τ = 900 h , from first formula in Equation (15)

the Bayesian UPL of λ τ = α β 1 + β τ with level γ = 0.90 is given by

λ U ( β ) ( τ ) = 0 .02473799 .

In software development, predictive analysis is very important as it helps the software developer to make a trade-off decision at the right time. In this paper, explicit solution to predictive issues that may arise during development process were derived using Bayesian approach. These solutions are helpful to software developers in many instances such as resource allocation, when to terminate the testing process, modification needed in the software before termination.

The study used Bayesian approach with non-informative priors to derived explicit solutions for predictive issues that may arise during software development process. In all the cases when the shape parameter was known, solutions to posterior and predictive distributions had closed forms while when it is unknown, solutions had no closed forms and the study used Markov Chain Monte Carlo (MCMC). Bayesian approach was used as it is advantageous over classical approach. Bayesian approach is available for small sample sizes and allows the input of prior information about reliability growth process and provides full posterior and predictive distributions [

However, it will be interesting to look at two-sample prediction for Musa-Okumoto (1984) model considering procedures that [

The authors declare no conflicts of interest regarding the publication of this paper.

Cheruiyot, N., Orawo, L.A. and Islam, A.S. (2018) Bayesian Predictive Analyses for Logarithmic Non-Homogeneous Poisson Process in Software Reliability. Open Access Library Journal, 5: e4767. https://doi.org/10.4236/oalib.1104767

We first state the following identity without proof: That is

∫ D ( m ; a , b ) d F ( t 1 ) ⋯ d F ( t m ) = [ F ( a ) − F ( b ) ] m / m ! (A.1)

where m is any positive integer, a and b are two real numbers such that a < b , F ( t ) is an increasing and differentiable function and

D ( m ; a , b ) ≙ { ( t 1 , ⋯ , t m ) : a < t 1 < ⋯ < t m < b } .

Proof of Proposition 1

The probability that at most k failures will occur in the interval ( T , τ ] is γ k = Pr { N ( τ ) ≤ n + k / Y o b s } . When β is known, we have

γ k = ∫ 0 ∞ Pr { N ( τ ) ≤ n + k / Y o b s , α } ⋅ h ( α / Y o b s ) d α . (A.2)

where h ( α / Y o b s ) is given by equation (4) and

Pr [ N ( τ ) ≤ n + k / Y o b s , α ] = ∑ j = n n + k f ( Y o b s , N ( τ ) = j / α ) / f ( Y o b s / α ) (A.3)

From Equation (2), we have f ( Y o b s / α ) = β n α n ∏ i = 1 n ( 1 + β t i ) − 1 e [ − α ln ( 1 + β T ) ] , and

f ( Y o b s , N ( τ ) = j / α ) = ∫ D ( j − n ; T , τ ) f ( Y o b s , x n + 1 , ⋯ , x j , N ( τ ) = j ) ∏ l = n + 1 j d x l = ∫ D ( j − n , T , τ ) α j β j e [ − α ln ( 1 + β T ) ] ∏ i = 1 j ( 1 + β t i ) − 1 ∏ l = n + 1 j d t l = α j β j ∏ i = 1 n ( 1 + β t i ) − 1 e [ − α ln ( 1 + β T ) ] ∫ D ( j − n , T , τ ) ∏ l = n + 1 j ( 1 + β t l ) − 1 ∏ l = n + 1 j d t l

Thus Equation (A.3) becomes

Pr [ N ( τ ) ≤ n + k / Y o b s ] = ∑ j = n n + k α j − n e [ − α ln ( 1 + β τ 1 + β T ) ] [ ln ( 1 + β τ 1 + β T ) ] j − n ( j − n ) ! (A.4)

And hence Equation (A.2) becomes

γ k = ∫ 0 ∞ ∑ j = n n + k α j − n e [ − α ln ( 1 + β τ 1 + β T ) ] [ ln ( 1 + β τ 1 + β T ) ] j − n α n − 1 [ ln ( 1 + β T ) ] n e [ − α ln ( 1 + β T ) ] ( j − n ) ! Γ ( n ) d α = ∑ j = n n + k [ ln ( 1 + β τ 1 + β T ) ] j − n [ ln ( 1 + β T ) ] n ( j − n ) ! Γ ( n ) Γ ( j ) [ ln ( 1 + β τ ) ] j × ∫ 0 ∞ [ ln ( 1 + β τ ) ] j Γ ( j ) α j − n e [ − α ln ( 1 + β τ ) ] d α (A.5)

The integral part of Equation (A.5) integrates to 1 since it is a gamma distribution with parameters j and ln ( 1 + β τ ) and hence Equation (A.5) reduces to

γ k = [ ln ( 1 + β T ) ] n [ ln ( 1 + β τ 1 + β T ) ] n ∑ j = n n + k ( j − n n − 1 ) [ ln ( 1 + β τ 1 + β T ) ] j [ ln ( 1 + β τ ) ] j . (A.6)

This is the first formula of Equation (11).

When β is unknown, noting that Pr { N ( τ ) ≤ n + k / Y o b s , α , β } and h ( α , β / Y o b s ) are given by Equation (A.4) and Equation (8) respectively, we obtain

γ k = ∫ 0 ∞ ∫ 0 ∞ Pr [ N ( τ ) ≤ n + k / Y o b s , α , β ] ⋅ h ( α , β / Y o b s ) d α d β = ∑ j = n n + k 1 ( j − n ) ! Γ ( n ) ∫ 0 ∞ ∫ 0 ∞ α j − 1 β n − 1 ∏ i = 1 n ( 1 + β t i ) − 1 [ ln ( 1 + β τ 1 + β T ) ] j − n e [ − α ln ( 1 + β τ ) ] d α d β = ∑ j = n n + k Γ ( j ) d ( j − n ) ! Γ ( n ) ∫ 0 ∞ β n − 1 ∏ i = 1 n ( 1 + β t i ) − 1 [ ln ( 1 + β τ ) ] j [ ln ( 1 + β τ 1 + β T ) ] j − n d β (A.7)

Since the summation of k is from n to n + k and k’s are not the same, we substitute letter k with d in Equation (A.7) where d = k as used in equation (8). Equation (A.7) implies the second formula in Equation (11). □

Proof of preposition 2

Let f ( λ τ / Y o b s ) denote the posterior of λ τ = α β / ( 1 + β τ ) . Hence, the probability that the target value λ t v will be achieved at time τ is given by

γ = Pr { λ τ ≤ λ t v / Y o b s } = ∫ 0 λ t v f ( λ τ / Y o b s ) d λ τ (A.8)

when β is known, making transformation λ τ = α β / ( 1 + β τ ) , we have α = λ τ 1 + β τ β and d α d λ τ = 1 + β τ β . Consequently, the posterior density of λ τ is

f ( λ τ / Y o b s ) = h ( α / Y o b s ) | d α d λ τ |

f ( λ τ / Y o b s ) = 1 Γ ( n ) [ λ τ ( 1 + β τ ) β ] n − 1 [ ln ( 1 + β T ) ] n e − λ τ ( 1 + β τ β ) ln ( 1 + β T ) ⋅ 1 + β τ β = 1 Γ ( n ) [ ( 1 + β τ β ) ln ( 1 + β T ) ] n λ τ n − 1 e − λ τ [ ( 1 + β τ β ) ln ( 1 + β T ) ] (A.9)

From Equation (A.9), it can easily be noted that λ τ has gamma distribution with parameters n and 1 + β τ β ln ( 1 + β T ) . Noting that gamma and Poisson distributions have a relationship defined as

b a Γ ( a ) ∫ 0 λ x a − 1 e − b x d x = 1 − ∑ h = 0 a − 1 ( b λ ) h h ! e − b λ . (A.10)

By substituting Equation (A.9) and Equation (A.10) into Equation (A.8), we obtain the first formula of Equation (12).

When β is unknown, making transformation λ τ = α β / ( 1 + β τ ) and β = β , we obtain α = λ τ 1 + β τ β and β = β . Note that the Jacobian is ∂ ( α , β ) ∂ ( λ τ , β ) = 1 + β τ β . From Equation (8), the joint posterior density of ( λ τ , β ) is

f ( λ τ , β / Y o b s ) = h ( α , β / Y o b s ) | d ( α , β ) d ( λ τ , β ) | .

f ( λ τ , β / Y o b s ) = [ λ τ ( 1 + β τ ) β ] n − 1 β n − 1 ∏ i = 1 n ( 1 + β t i ) − 1 k Γ ( n ) e − λ τ 1 + β τ β ln ( 1 + β T ) ⋅ 1 + β τ β = β n − 1 ∏ i = 1 n ( 1 + β t i ) − 1 k Γ ( n ) [ ln ( 1 + β T ) ] n [ 1 + β τ β ln ( 1 + β T ) ] n λ τ n − 1 e − λ τ 1 + β τ β ln ( 1 + β T ) (A.11)

By substituting Equation (A.10) and Equation (A.11) into Equation (A.8), we obtain the second formula of Equation (12). □

Proof of preposition 3

For given level γ , the time required to attain the target value λ t v is τ ∗ = τ − T , where τ satisfies Equation (44). When β is known, from Equation (46), it can easily be seen that

2 [ 1 + β τ β ln ( 1 + β T ) ] λ τ

follows a chi-square distribution with 2n degrees of freedom. Thus we have

2 [ 1 + β τ β ln ( 1 + β T ) ] λ t v = χ 2 ( 2 n , γ ) . (A.12)

and Equation (13) follows immediately.

The time required to attain the target λ t v with level γ when β is unknown is τ * = τ − T where τ is the solution to

γ = 1 − 1 k ∑ h = 0 n − 1 ∫ 0 ∞ [ λ τ 1 + β τ β ln ( 1 + β T ) ] h h ! e − λ τ 1 + β τ β ln ( 1 + β T ) ⋅ β n − 1 ∏ i = 1 n ( 1 + β t i ) − 1 [ ln ( 1 + β T ) ] n d β . (A.13). □

Proof of preposition 4

For a pre-determined τ ( τ > T ) , the Bayesian upper prediction limit for λ τ = α β 1 + β τ with level γ is λ U ( β ) ( τ ) satisfying γ = Pr { λ τ ≤ λ U ( β ) ( τ ) / Y o b s } . From Equation (A.8) and Equation (A.12), we have λ U ( β ) ( τ ) = β χ 2 ( 2 n ; γ ) 2 ( 1 + β τ ) ln ( 1 + β T ) , thus follows Equation (15). The second part follows similarly. □