Statistical Analysis of Process Monitoring Data for Software Process Improvement and Its Application

Software projects influenced by many human factors generate various risks. In order to develop highly quality software, it is important to respond to these risks reasonably and promptly. In addition, it is not easy for project managers to deal with these risks completely. Therefore, it is essential to manage the process quality by promoting activities of process monitoring and design quality assessment. In this paper, we discuss statistical data analysis for actual project management activities in process monitoring and design quality assessment, and analyze the effects for these software process improvement quantitatively by applying the methods of multivariate analysis. Then, we show how process factors affect the management measures of QCD (Quality, Cost, Delivery) by applying the multiple regression analyses to observed process monitoring data. Further, we quantitatively evaluate the effect by performing design quality assessment based on the principal component analysis and the factor analysis. As a result of analysis, we show that the design quality assessment activities are so effective for software process improvement. Further, based on the result of quantitative project assessment, we discuss the usefulness of process monitoring progress assessment by using a software reliability growth model. This result may enable us to give a useful quantitative measure of product release determination.


Introduction
In recent years, with dependence of the computerized system, software development has become more largescaled, complicated, and diversified.At the same time, customer's demand of high quality and shortened delivery has increased.Therefore, we have to pursue the project management efficiently in order to develop highly quality software products.Also, we need to statistically analyze process data observed in software development projects.Based on the process data, we can establish the PDCA (Plan-Do-Check-Act) management cycle in order to improve the software development process with respect to software management measures about QCD (Quality, Cost, Delivery) [1,2].
There are many risks latent in promoting software projects.These risks often lead to QCD (Quality, Cost, Delivery) related problems, such as system failures, budget overruns, and delivery delays which may cause the project to fail.In order to lead a software project to become successful, project managers need to conduct adequate project management techniques in the software development process with technological and management skills.However, it is not easy for them to respond to all the risks.Therefore, we discuss the following two improvement activities:  Process monitoring activities;  Assessment activities of design quality.
Process monitoring activities review the process from the early-stage of the project by a third person of the quality assurance unit to find latent project risks and QCD related problems as shown in Figure 1 [3].The process monitoring activity may help project managers to pursue the project management efficiently and also improve management process to lead a project to success.Assessment activities of design quality evaluate the completeness of required specifications and design specifications by third person as well as process monitoring activities.The assessment activities of design quality may improve the software development process by eliminating software faults.
The organization of the rest of this paper is as follows: Based on the results of Fukushima and Kasuga [4,5], Section 2 analyzes actual process monitoring data by using multivariate linear analyses, such as multiple regression analysis, principal component analysis, and factor analysis.At the same time, we use collaborative filtering to estimate of the missing value.Based on the derived quality prediction models, we make the software process factors affecting quality of software product clear.Section 3 evaluates the effect quantitatively by introducing design quality assessment activities into process monitoring based on the principal component analysis and the factor analysis.Further, Section 4 discusses a method of quantitative project assessment with the process monitoring data based on a software reliability growth model.Finally, Section 5 summarizes the results obtained in this paper.

Process Monitoring Data
We analyze the factors affecting the quality of software products by using the process monitoring data as shown in Table 1.Ten variables measured by the process monitoring are used as explanatory variables.Three variables such as software management measures of QCD are used as objective variables.These variables are defined in the following: The number of problems detected in the contract review.
X 2 : The number of days how long it took for the problems to be solved in the contract review.
X 3 : The number of problems detected in the development planning review.
X 4 : The number of days how long it took for the prob-lems to be solved in the development planning review.X 5 : The number of problems detected in the design completion review.
X 6 : The number of days how long it took for the problems to be solved in the design completion review.
X 7 : The number of problems detected in the test planning review.
X 8 : The number of days how long it took for the problems to be solved in the test planning review.
X 9 : The number of problems detected in the test completion review.
X 10 : The number of days how long it took for the problems to be solved in the test completion review.
Y q : The number of detected faults given by the following expressions: (The number of faults) = (The number of faults detected during acceptance testing) + (The number of faults detected during production).
Y c : The cost excess rate given by the following expressions: (Cost excess rate) = (Actual cost value)/(Scheduled software development cost).
If the cost excesses rate is over 1.0, it means that the expenses exceed the software development budget.
Y d : The number of delivery-delay days to the shipping time planned at the project initiation time.
There are some missing values in Table 1.Therefore, we apply collaborative filtering [6] to estimate of these missing values.And projectNo.17-projectNo.21 are ones in which design quality assessment was carried out, whereas projectNo.1-projectNo.16are ones in which design quality assessment was not assessed.

Multiple Regression Analysis
By using the process monitoring data in Table 1, we can perform correlation analysis among the explanatory and objective variables as follows:  Contract review, design completion review, and test completion review have shown strong correlations to the measures of QCD. Y q has shown strong correlation to Y c and Y d .
Based on the correlation analysis, we can find that it is important to reduce the number of faults and ensure the software quality in order to prevent cost excess and delivery-delay.Therefore, X 5 , X 7 , and X 10 are selected as important factors for estimating a software quality prediction model [7,8].
Then, a multiple regression analysis is applied to the process monitoring data as shown in Table 1.Then, using X 5 , X 7 , and X 10 , we have the estimated multiple regression equation predicting for software faults, , given by Equation (1) as well as the normalized multiple regression expression, ˆq Y ˆN q Y , given by Equation ( 2): ˆ0.292 0.354 0.501 .
In order to check the goodness-of-fit adequacy of our model, the coefficient of multiple determination R 2 is calculated as 0.735.Furthermore, the squared multiple correlation coefficient adjusted for degrees of freedom (adjusted R 2 ), called the contribution ratio, is given by 0.669.The result of multiple regression analysis is summarized in Tables 2-3.
From Table 2, it is found that the precision of these multiple regression equations is high.Then, we can predict the number of faults detected for the final products by using Equation (1).From Equation ( 2), the order of the degree affecting the objective variable Y q is X 5 < X 7 < X 10 .Therefore, we conclude that the design completion review, the test planning review, and the test completion review have an important impact on product quality.

Analysis Data
We analyze the effect of design quality assessment by using the process monitoring data in Table 1.Then, we assume that the model signifies that although the risks at the start of the project negatively affect the management measures of QCD, the QCD can be improved by process monitoring activities and design quality assessment.
Based on this hypothetical model, we analyze by using initial project risks data (as shown in Table 4) as well as the process monitoring data.These new variables are explained in the following: X 11 : The risk ratio of project initiation.The risk ratio is given by the following expressions:   where the risk estimation checklist has weight (i) in each risk item (i), and the risk score ranges between 0 and 100 points.Project risks are identified by interviewing using the risk estimation checklist.From the identified risks, the risk score of a project is calculated by Equation (3).X 13 : The number of days during development period.X 14 : The estimated man-hours (the development budgets divided by the development cost per hour).

Principal Component Analysis
In order to clarify the relationship among variables and analyze the effect of design quality assessment activities on the management measures of QCD, principal component analysis [7,8] is performed by using the process monitoring data and initial project risks data in Tables 1 and  4. It is found that the precision of analysis is high from Table 5.And the factor loading values are obtained as shown in Table 6.The principal component scores are obtained as shown in Table 7. From Table 6, let us newly define the first and second principal components as follows:  The first principal component is defined as the measure for QCD attainment levels. The second principal component is defined as the measure for software project estimation (development size, period, effort).We obtain a scatter plot of the factor loading values in Figure 2. From Figure 2, it is found that the factors of process monitoring have shown positive correlation to the management measures of QCD.Therefore, we can consider that the process monitoring activities have an important impact on the management measures of QCD.
Further, we also obtain a scatter plot of the principal component scores as shown in Figure 3. Projects in which design quality assessment was carried out are indicated by the "  " marks, whereas "  " marks indicate  -0.422 that design quality assessment was not performed.From Figure 3, it is found that the values of the first principal components are small.This result has shown that the projects in which the design quality assessment activities were carried out can reduce the number of faults, the cost excess, and the delivery-delay.

Factor Analysis
Factor analysis is performed by using the process monitoring data and initial project risks data in Tables 1 and 4 as well as principal component analysis.Then, the method of varimax rotation is applied to the rotation of factor axes.From Table 8, it is found that the precision of analysis is high.And the factor loading values are obtained as shown in Table 9.The factor scores are obtained as shown in Table 10.
From Table 9, it is found that X 5 , X 6 , X 9 , and X 10 are the same group factors considered as "the value of project attainment", X 1 , X 2 , X 3 , X 4 , and X 8 as "the value of project planning", X 7 , X 11 , Y q , and Y c as "the value of quality and cost", and X 12 , X 13 , X 14 , and Y d as "the value of estimation and delivery".
From Table 10, it is found that the values of the third factor of all the projects in which the design quality assessment activities were carried out are small.This result has shown that the projects in which the design quality assessment activities were carried out can reduce the number of faults, the cost excess, and the delivery-delay.

Quantitative Project Assessment
Next, we discuss quantitative project assessment based on the process monitoring data.A project progress growth curve in the process monitoring activities is assumed to be the relationship between the number of process monitoring progress phases and the cumulative number of QCD problems detected during the process monitoring.Then, we apply Moranda geometric Poisson model [9], which is a software reliability growth model (SRGM), to the process monitoring data on X 1 , X 3 , X 5 , X 7 , and X 9 as shown in Table 1.
We discuss project progress modeling based on the Moranda geometric Poisson model because an analytic treatment of it is relatively easy.Then, we choose the number of process monitoring progress phases as the alternative unit of testing-time by assuming that the observed data for testing-time are discrete in an SRGM.
In order to describe a fault-detection phenomenon dur- ing processing monitoring progress phase i ( 1, 2, i   ), let N i denote a random variable representing the number of problems detected during i th project monitoring progress interval (T i-1 , T i ] (T 0 = 0; ).Then, the problem-detection phenomenon can be described as follows: Pr exp !0, 0 1; 0,1, 2, , where Pr{A} means the probability of event A, and  = the average number of problems detected in the first interval (0, T 1 ], k = the decrease ratio of the number of problems detected by process monitoring activities. From Equation ( 4), setting T i = i ( i ), we obtain the following quantitative project assessment measures, that is, the expected cumulative number of problems detected up to n th process monitoring progress phase, E(n), and the expected total number of problems latent in the software project, , are given as Equations ( 5) and ( 6), respectively: Project assessment measures play an important role in quantitative assessment of process monitoring progress.The expected number of remaining problems, r(n), represents the number of problems latent in the software project at the end of n th process monitoring progress phase, and is formulated as and the instantaneous MTBP which means mean time between problem occurrences is formulated as Further, a project reliability represents the probability that a problem dose not occur in the time-interval (n, n + 1] (n ≥ 0) given that the process monitoring progress has been going up to phase n.Then, the project reliability function is derived as We present numerical examples by using the Moranda geometric Poisson model for ProjectNo.5. Figure 4 shows the estimated cumulative number of problems detected, E(n), and the actual measured values during process monitoring progress interval (0, n] where the estimated parameters are given as  = 4.39 and = 0.773 by using a method of maximum-likelihood.Figure 5 shows the estimated expected number of remaining problems, r(n).From Figure 5, it is found that there are 5 problems remaining at the end of test completion review phase (n = 5).k Further, the estimated instantaneous MTBP is obtained as shown in Figure 6.From Figure 6, it is found that the process monitoring activities is going well because the MTBP is growing.As for project reliability, it is necessary to keep conducting the process monitoring activities because the project reliability after the test completion review is 39 percent.

Concluding Remarks
In this paper, we have discussed statistical data analysis for actual activities for process monitoring and design quality assessment, and analyzed these effects for software process improvement quantitatively by applying the methods of multivariate analysis.We have found how process factors affect the management measures of QCD by applying the multiple regression analyses to observed process monitoring data.Further, we have evaluated the effect quantitatively by performing design quality assessment based on the principal component analysis and   multiple regression analysis, we have found that the design completion review, the test planning review, and the test completion review have an impact on final product quality.Then, we can consider that the problems in the test planning review and the test completion review were influenced by those in the design completion review.That is, it is very important to manage the design quality in software development.At the same time, we have quantitatively confirmed that the design quality assessment activities are so effective for software process improvement.
Further, as a result of quantitative project assessment, we have confirmed the usefulness of process monitoring progress assessment based on the Moranda geometric Poisson model.These results enable us to give a useful quantitative measure of product release determination.
As an above-mentioned result, in order to lead a software project to become successful, it is important to perform continuous improvement of the software development process by conducting adequate project management techniques such as process monitoring and design quality assessment activities.
In the future, we need to derive a highly accurate quality prediction model, and find the factors which influence management measures of QCD in order to lead a software project to become successful.

Figure 1 .
Figure 1.Overviews of the process monitoring activities.
X 12 : The development size.The development size is given by the following expressions: Development size (Kilo (10 3 ) steps) = (Number of newly developed steps) + 1.5 × (Number of modified steps) + (Influential rate) × (Number of reused steps).The influential rate ranges between 0.01 and 0.1.

Figure 2 .
Figure 2. Scatter plot of the factor loading values.

Figure 3 .
Figure 3. Scatter plot of the principal component scores.

Figure 4 .
Figure 4.The estimated cumulative number of detected problems, E(n).

Figure 5 .
Figure 5.The estimated expected number of remaining problems, r(n).

Table 1 . Observed process monitoring data.
q

Table 2 . Table of analysis of variance.
** means statistical significance at 1% level.