A Study of Quantitative Progress Evaluation Models for Open Source Projects ()
1. Introduction
Open source software (OSS) is code-designed to be accessible to everyone. OSS can be viewed, modified, and distributed as desired by anyone. It is often cheaper, more flexible, and more long-lasting than proprietary software, because OSS is developed by an open source community rather than a single author or company. However, because anyone can join an open source community, there is a high degree of uncertainty about project progress due to differences in project members’ skills, development environments, and time frames of activity. Therefore, it is difficult to predict the progress in open source projects, and many users and companies need to understand the development and operation status of open source projects in terms of making decisions on upgrading or installing of OSS.
Such issues have led to researches on progress forecasting in open source projects [1] [2] [3] [4]. Many researches of project progress forecasting include the estimating of the required effort in order to resolve the reported faults [1] [2] [3]. Moreover, there are several research papers in order to estimate the maintenance effort required by individual developers [4].
On the other hand, there are few researches that estimate the amount of maintenance effort for the entire project and evaluate the stability of the project. Tamura et al. [5] have researched time-series prediction of maintenance effort for an entire open source project, but their evaluation of the project’s progress is limited because it is limited to simple effort prediction.
In this paper, we examine a method for evaluating project stability based on maintenance effort in open source projects. In particular, we use software reliability growth models [6] [7] [8] [9] to predict the number of maintenance effort for open source projects considering uncertainty. For example, there is the stochastic approach based on stochastic differential equation for the other research area [10]. Then, we evaluate the project stability and quantitatively by using earned value management (EVM) [11]. Finally, we discuss the appropriateness of the model used in predicting maintenance effort, and discuss the appropriate model for this method.
2. Evaluation Approach for Open Source Project Stability
2.1. EVM: Overview
In this paper, we use an EVM methodology for the stability evaluation for open source project. The EVM is one of the project management methodology for measuring the project performance and progress. The project progress evaluation by using EVM is used not only for software development but also for open source projects in various fields.
We can grasp the current cost and schedule condition in the project by using the EVM. The EVM basically measures the project progress and performance by using three indices: Earned Value (EV), Planned Value (PV), and Actual Cost (AC) as shown Figure 1. Also, we can quantitatively grasp the current status of the project by comparing three indices such as Table 1.
Sone et al. [12] have researched the applicability of EVM to open source projects was verified, However, PV could not be derived due to a difficulty with the method used to derive the effort. In this paper, we try to derive EVM indices properly.
2.2. SRGM: Overview
In the testing process of software development, the number of potential faults in the software decreases with the progress of testing time, because a lot of resources are spent on fault detection and correction. Therefore, the probability of software fault occurrence decreases with the testing time. Then, the software reliability and the interval of software fault occurrence time increase. Such software reliability model describes software fault phenomenon. This is called software reliability growth model (SRGM).
Table 1. Several examples of the indices used in EVM.
In this paper, we use three models such as the exponential model, the delayed S-shaped model, and the infection S-shaped model. These are well-known models in the SRGM. We apply these models to estimate the open source projects by using EVM.
2.3. Effort Prediction Modeling for Open Source Projects
Considering the characteristic of the operation phase in open source projects, the time-dependent expenditure behavior of maintenance effort keeps an irregular state in the operation phase, because there is variability among the levels of project members’ skill. Then, the time-dependent effort expenditure behavior of operation phase becomes unstable.
The operation phases of many open source projects are influenced from the external factors by triggers such as the difference of skill, and the time lag of development and maintenance activities. Considering the above points, we apply stochastic differential equation modeling for managing of the open source project. Then, let
be the cumulative maintenance effort expeditures, such as finding software faults and improving functionality up to operational time
in the open source project. Suppose that
takes on continuous real values.
gradually increases as the operational procedures go on. Based on SRGM approach [6] [7], the following linear differential equation in terms of the maintenance expence effort can be formulated as:
(1)
where
is the increase rate of maintenance effort at operational time t and a non-negative function, and
means the estimated maintenance effort expenditures required until the end of operation.
Therefore, we extend Equation (1) to the following stochastic differential equation with Brownian motion [13]:
(2)
where
is a positive constant representing a magnitude of the irregular fluctuation, and
a standardized Gaussian white noise. By using Itô’s formula [14], we can obtain the solution of Equation (2) under the initial condition
as follows:
(3)
where
is the Wiener process which is formally defined as an integration of the white noise
with respect to timet. Moreover, we define the increase rate of maintenance effort in case of
defined as [15]:
(4)
In this paper, we assume the following equations based on software reliability models
as the cumulative maintenance effort expenditures function of the proposed model:
(5)
(6)
(7)
where
means the cumulative maintenance effort expenditures for the exponential software reliability growth model with
. Similarly,
is the cumulative maintenance effort expenditures for the delayed S-shaped software reliability growth model with
. Also,
means the cumulative maintenance effort expenditures for the inflection S-shaped software reliability growth model with
, respectively.
Therefore, the cumulative maintenance effort,
,
and
up to time t are obtained as follows:
(8)
(9)
(10)
In these models, we assume that the parameter
depends on several noises by external factors from several triggers in open source projects. Then, the expected cumulative maintenance effort expenditures spent up to time t are respectively obtained as follows:
(11)
(12)
(13)
2.4. Derivation of EVM for Open Source Project
In EVM for open source project, the period of data used for Planned Value (PV) and Actual Cost (AC) have the different values. Both PV and AC use the data obtained from the bug tracking system and required by the fault reporters and the fault correctors. In the open source projects, we assume that the project period is from OSS release to EOL (End of Life). Then, we can use the maintenance effort data until OSS release based on Equations (8)-(13) in order to derive PV. In particular, the parameter
in Equations (8)-(13) mean as the estimated maintenance effort at the time t, when OSS is released. Therefore, the parameter
can be rephrased as Budget at Completion (BAC) in EVM. AC uses the maintenance effort data including after the OSS release. Therefore, the start time of the data used to derive PV and AC is the same.
Earned Value (EV) is the cumulative maintenance effort expeditures viewed on the same scale as the project budget (BAC). Therefore, if the OSS development effort increases but the fault is not resolved, the value of EV becomes small. Then, it is regarded as an inefficient open source project. In the derivation of EV value, the number of potential faults predicted from the fault data reported up to the time of OSS release is used. We use Equations (8)-(13) to predict the number of potential faults. We derive the fault resolving cost, i.e., the value obtained by dividing the number of potential faults from the BAC, as follows:
(14)
Then,
means the fault resolving cost, and p means the potential faults at OSS release. We can derive the EV in cases of
,
, and
by using the fault resolving cost
and the cumulative number of resolved faults up to the operating time t.
(15)
(16)
(17)
Then,
,
,
, and
are parameters used to predict the cumulative number of resolved faults at time t. Therefore, the expected EV required for OSS maintenance until the end of operation time t are respectively obtained as follows:
(18)
(19)
(20)
Then, the resolved cumulative number of faults is counted when the fault status is Closed in the bug tracking system.
In this paper, EVM uses the dataset obtained from bug tracking system to derive PV, AC, and EV. We assume the following terms in the Table 2 as the EVM in the open source project considering the derivation of these EVM indices.
Table 2. Explanation for EVM used in this research.
3. Numerical Examples
3.1. Data Set
In this paper, we use the data set of open source project for deriving EVM indices. For applying the proposed model to actual project data set, we use the data of LibreOffice [16] obtained from Bugzilla. LibreOffice is an office suite OSS provided by The Document Foundation. In particular, the effort and fault data have been obtained from Bugzilla are version 7.2 for estimating PV and AC. In this paper, the cumulative number of reported faults are 298 and 878, respectively. In particular, we use the project data for about 39 weeks, before LibreOffice was released for estimating PV. For estimating AC, we also use project data for about 112 weeks. Also, each unit data is weekly.
3.2. Estimation of EVM Indices
In this section, we estimate the model parameters of the three SRGM models for estimating the maintenance effort and the number of faults in case of LibreOffice version 7.2 project. Also, we compare the appropriateness of our model with appropriate models.
Table 3 shows the results of parameter estimation of maintenance effort, and AIC (Akaike’s Information Criterion) for comparison of model equations. Also, the parameter
in the PV data can be rephrased as BAC. In terms of AIC, the delayed S-shaped model is the best one for PV estimation. Figure 2 shows the results of applying the delayed S-shaped model to the open source project data.
Next, Table 4 shows the results of parameter estimation of AC, and AIC. Also, the parameter
can be rephrased as the project’s estimated AC. In terms of AIC, the delayed S-shaped model is the best one for AC estimation. Figure 3 shows the results of applying the delayed S-shaped model to the open source project data.
Table 3. Parameter estimation of maintenance effort in terms of PV.
Figure 2. The cumulative maintenance effort expeditures as PV in LibreOffice Ver. 7.2 project by using Equations (9) and (12).
Table 4. Parameter estimation of maintenance effort in terms of AC.
Figure 3. The cumulative maintenance effort expeditures as AC in LibreOffice Ver. 7.2 project by using Equations (9) and (12).
Also, Table 5 shows the results of parameter estimation of the estimated number of potential faults at OSS release, and AIC. We use the parameter
for deriving fault resolving cost. There is no significant difference in AIC values among all the model equations used in this research. Therefore, it is difficult for us to identify a suitable model for the data used in this research. For convenience, we assume that the exponential model with the smallest AIC is the appropriate model. Figure 4 shows the results of applying the exponential model to the open source project data.
Finally, Table 6 shows the results of parameter estimation of the estimated number of resolved faults at present, and AIC. In terms of AIC, the infection S-shaped model is the best method for the number of resolved faults estimation. Figure 5 shows the results of applying the delayed S-shaped model to the open source project data.
A comparison of the AIC values during parameter estimation in the three model equations showed that the delayed S-shaped model and the infection S-shaped model are appropriate. In the LibreOffice version 7.2 project data, the increase rate of maintenance man-hours and number of faults at the start of the maintenance phase is small. We find that the delayed S-shaped model and infection S-shaped model are appropriate for such project data.
In open source projects, the number of fault reports increases as the number of OSS users increases after the release of a particular version. As a result, the effort required for fault maintenance increases. Therefore, the appropriate model equation for many open source project data would be the same as in this research.
Table 5. Parameter estimation of number of potential faults in case of LibreOffice.
Figure 4. The cumulative estimated number of potential faults by using Equations (8) and (11).
Table 6. Parameter estimation of number of resolved faults in case of LibreOffice.
Figure 5. The cumulative estimated number of resolved faults by using Equations (10) and (13).
In this research, we derive EVM indices by using the best-fit model equation for each data set. The fault resolving cost
, one of the EVM indices, is necessary for the derivation of EV. Figure 6 shows the results of EV, AC, and PV estimations.
Figure 6 shows that both EV and AC are larger than PV. In particular, the EV value is very large. This is because the number of resolved faults is estimated to be higher than the number of potential faults. On the other hand, the EV value is lower than the EV and AC values around 50 weeks, the time of the version 7.2 release, showing the project is in a delayed state. In other words, after version 7.2 was released, we find that the project became more active as the number of users of that version increased.
Figure 6. EVM estimation results in LibreOffice project.
4. Conclusions
In this paper, we have examined a method for evaluating project stability based on SRGM in open source projects. In terms of AIC, we have identified the appropriateness models in the open source project. Then, we have found that the delayed S-shaped model and the infection S-shaped model are the best models. We have concluded that the results are the same as other open source project, because of the characteristic of the number of OSS users’ transitions. Also, we have derived EVM by using the appropriate SRGM models. As a result, we have found that the trigger for activating open source projects is after the release of a particular version.
Researches on stability evaluation methods for open source projects have often focused on the resolving of individual faults. Therefore, the practical application of EVM for evaluating the stability of open source projects as a whole will contribute to the future development of OSS. On the other hand, since the proposed method evaluates stability based on the cost of the entire open source project, it is difficult to evaluate the causes of project stability in fault units. Therefore, we consider that using not only the proposed method but also individual fault-based project evaluation methods will provide a better project stability evaluation tool.
As only one open source project data set has been used in this paper, it is necessary to verify the characteristics of the trends in maintenance effort and number of faults by using multiple project data sets in the future.
Acknowledgements
This work was supported in part by the JSPS KAKENHI Grant No. 20K11799 in Japan.