Component-Oriented Reliability Analysis Based on Hierarchical Bayesian Model for an Open Source Software

The successful experience of adopting distributed development models in such open source projects includes GNU/Linux operating system, Apache HTTP server, Android, BusyBox, and so on. The open source project contains special features so-called software composition by which several geographically-dispersed components are developed in all parts of the world. We propose a method of component-oriented reliability assessment based on hierarchical Bayesian model and Markov chain Monte Carlo methods. Especially, we focus on the fault-detection rate for each component reported to the bug tracking system. We can assess the reliability for the whole open source software system by using the confidence interval for each component. Also, we analyze actual software fault-count data to show numerical examples of reliability assessment for OSS.


Introduction
Software development environment has changed into new development paradigms such as concurrent distributed development environment and the so-called open source project by using network computing technologies [1].Especially, such OSS (Open Source Software) systems which serve as key components of critical infrastructures in the society are still ever-expanding now [2].The methodology of the object-oriented design and analysis is a feature of distributed development environment and greatly successful in the field of programming-language, simulation, GUI (graphical user interface), and constructing on database in the software development.A general idea of the objectoriented design and analysis is developed as a technique which can easily construct and maintain the complex system.The successful experience of adopting the distributed development model in such open source projects includes GNU/Linux operating system [2].However, the poor handling of the quality and customer support prohibit the progress of OSS.We focus on the problems in the software quality, which prohibit the progress of OSS.
Especially, many software reliability growth models (SRGM's) [3] have been applied to assess the reliability for quality management and testing-progress control of software development.On the other hand, the effective method of dynamic testing management for a new distributed development paradigm as typified by the open source project has only a few presented [4][5][6][7][8].In case of considering the effect of the debugging process on entire system in the development of a method of reliability assessment for OSS, it is necessary to grasp the situation of registration for bug tracking system, the connection status of each component, degree of maturation of OSS, and so on [9,10].
Especially, OSS is composed of several software components as a feature of distributed development environment.In such cases, it is appropriate to apply the method of component based reliability assessment rather than one of reliability assessment based on SRGM's.Many SRGM's are assumed to be suitable for the system testing phase of software development.On the other hand, it is difficult to apply SRGM's to OSS, because OSS development style has not the typical software development environment, i.e., OSS development cycle has no testing phase.Moreover, OSS is developed under a combination of many software components.Therefore, it is important for software developers to confirm the static state of each component in OSS development phase from the standpoint of reliability assessment [6][7][8].The char-acteristics in terms of the reliability assessment for OSS's are shown as follows [11,12] It is important to understand the static state of OSS, i.e., the connection status of each component.We consider the method of reliability assessment for the whole OSS system by using the data of proportion for fault-detection rate in terms of the software components.Especially, we estimate the predicted distributions by using MCMC.Then, we use the data of proportion data for fault-detection rate in terms of the software components on the bug tracking system as the sample data.Also, we analyze actual software fault-count data to show numerical examples of software reliability assessment for the OSS.Especially, we derive the confidence interval for each component.Then, we show that the proposed method can assist improvement of quality for OSS.Our method may be useful for the software testing manager to assess the static state of the whole OSS system automatically.

Estimation of Predicted Distribution Based on Bayesian Theory
We apply a Bayesian theory to the data in terms of faultdetection rate of each OSS components.Let t be the proportion data of the fault-detection rate in the OSS by operational time  is the parameter of the specific distribution at operational time t .We estimate t  by using y t .In this case, we use the prior distribu- tion up to time .As an example, the updated data is given by the following equation based on Bayesian theory in case that we have knowledge of the prior information In this paper, we estimate t  by using the data of proportion for past fault detection rate in order to estimate t  for the sequential data .Then, we can derive the following equation from Equation (1): According to Equation ( 2), ( | , , , ) . Therefore, we define as follows:   3) means the probability at operational time obtained from t t  Then, we assume the simple case as follows: where t  is the independent Gaussian noise at opera- tional time t [13].

Hierarchical Bayesian Model
In this paper, we assume the data trend of proportion for the fault-detection rate as the following probability density function of normal distribution for simplicity: where  is the mean value and  the standard deviation.We consider the hierarchical Bayesian model based on the prior distribution and hyper prior distribution composed of  and  .
Then, we can obtain the following equation from Equation (1).
Therefore, we can derive as follows: y According to Equation ( 7),   .Therefore, we can obtain as follows: cause MH algorithm has simple structure, and widely used in many research fields.
The flow of MH algorithm is shown in Figure 1.Also, the procedures of MH algorithm is as follows:  Generate   by using the applied density ( ,   . Continue the above mentioned process without the initial value dependence.In this paper, we assume that ( ) given by the following equation in this paper: Equation ( 8) means the probability at operational time t obtained from the operational time .We can estimate ( 1) t   and  at operational time t from Equation ( 8).
Also, we can derive the confidence interval from the estimated mean value  and standard deviation ˆ.In case of the upper side probability 100 %  and the degree of freedom (  , we can obtain the upper and lower confidence limits for the estimated confidence interval as follows:

Numerical Examples
There are many open source projects around the world.In particular, we focus on an large scale open source solution based on the Apache HTTP Server [16].The fault-count data used in this paper are collected in the bug tracking system on the website of each open source project.
1 (1 ) , where k ( ) at the degree of freedom .Also, means the total number of data.k n

MCMC
The data of proportion for actual fault-detection rate for each component in Apache HTTP Server is shown in Table 1.We use the data from January 2008 to September 2010.Table 1 shows the data of proportion for actual fault-detection rate for each month.Also, we apply "Core", "Documentation", "mod_ssl", "mod_proxy", and "Build" as the major components.We focus on the data of all platform for Apache HTTP Server 2 version.We assume that a unit of time is week, because these results and computational times show little change if the unit of time is day in terms of the software fault data sets.
It is one of the sampling method of the probability distribution based on Markov chain by the random number generation.Basically, it is difficult to take a sample of random variable from the multivariate distribution.However, we can easily take the probability sample from the objective probability by using MCMC [14,15].Several MCMC algorithms have been proposed by several researchers in order to solve these problems, i.e., Metropolis-Hastings (MH) and Gibbs Sampler.Gibbs Sampler is the extended method of MH algorithm.We apply the   We show the estimation results based on MCMC for each component in Figures 2-6, respectively.Moreover, the comparison results of the estimates with the actual data are shown in Table 2. Above mentioned results, we can find that the level of fault-detection rate for "Core" component is largest.On the other hand, we can find that the level of fault detection rate for "Build" component is smallest.Therefore, we can confirm that "Core" component is the most affected one for the whole OSS system.Moreover, we can confirm that the standard deviation of fault importance level for "Document" is large.Thereby, there is variation in the data of proportion for actual faultdetection rate.Also, the estimation results of Table 2 is shown to be optimistic results in terms of the standard deviations.
We show the estimation results based on MCMC for each component in Figures 2-6, respectively.Above the mentioned results, we can find that the level of fault detection rate for "Core" component is largest.On the other hand, we can find that the level of fault detection rate for "Build" component is smallest.Therefore, we can confirm that "Core" component is the most affected one for the whole OSS system.Moreover, we can confirm that the standard deviation of fault importance level for "Document" is large.Thereby, there is variation in the data of proportion for actual fault-detection rate.

The Estimation Results Based on MCMC
On the other hand, "mod_ssl" and "mod_proxy" components decrease in width of the confidence interval, because the open source project is proceeding without problems according to be removed the faults of small components.

with Time Variation Considering Confidence Interval
In this section, we consider the case of 24, 0.95 n    .Then, 95% confidence interval is given by the following equation:

Concluding Remarks
In this paper, we have focused on the reliability of OSS.Moreover, we have proposed the method of componentoriented software reliability assessment based on the hierarchical Bayesian model and MCMC in order to estimate the predicted distributions for each component of OSS.Especially, we have assumed the data of proportion for the fault-detection rate as the probability density function of normal distribution.Also, we have analyzed actual software fault-count data to show numerical examples of component-oriented software reliability assessment for OSS.we can confirm that "Core" component is constant in small width of the confidence interval.Also, "Core" component remains in the large value continuously.These results mean that the open source project keeps a high active state.Therefore, we consider that the focused OSS system is stable in terms of the occurrence rate of "Core" component.Finally, we have focused on fault-detection rate for fault importance level of OSS.By using our method, the software testing manager can assess the static state of OSS.Our method may be useful as the method of component-oriented reliability assessment for OSS. t t t

Figure 1 .
Figure 1.The flow diagram of MH algorithm.

Figure 2 .
Figure 2. The estimation results for Core component.
based on MCMC with time variation of the data of proportion for each component are shown in Figures 7-11, respectively.From Figures 7-11,

Figure 7 .
Figure 7.The estimation results of confidence interval for Core component.

Figure 8 .
Figure 8.The estimation results of confidence interval for Documentation component.

Figure 9 .
Figure 9.The estimation results of confidence interval for mod_ssl component.

Figure 10 .
Figure 10.The estimation results of confidence interval for mod_proxy component.

Figure 11 .
Figure 11.The estimation results of confidence interval for Build component.
:  OSS development cycle has no testing phase;  The cumulative number of detected faults can not converge to a finite value;  It is difficult to apply SRGM's to the development