Software Cost Estimation Approaches : A Survey

The software cost estimation aims to predict the most realistic effort that is required to finish a software project and so it is critical to the success of a software project management. A Software Cost Estimation affects nearly all management activities, including project bidding, resource allocation and project planning. It is affected by a number of factors, such as implementation efficiency, as well as how much the various reviews and studies completed prior to the software development stage cost. Accurate cost estimation will help us to complete the project on time and within budget. Accurate estimation is important because it has led to extensive research into the methods of software cost estimation. Some important software cost estimation methods have been studied in this research work. In addition, we have set out own criteria, which has been used to compare all the different selected methods. We have also given a score for each evaluation criteria, so that we can compare the different methods numerically for cost estimation. Our observations have shown that it is best to use a number of different estimating techniques or cost models, and then compare the results before determining the reasons for any of the large variations. None of the methods are necessarily better or worse than the others. We found, in fact, that their strengths and weaknesses often complement each other. Therefore, the main conclusion is that there is no one single technique that is best for every situation, and the results of a number of different approaches need to be carefully considered to discover what is the most likely to produce estimates that are realistic.


Introduction
Estimating the costs of software projects is a critical activity that requires the use of both proper methods and techniques in order to achieve a good estimation of the results.This is a challenging task that poses many obstacles.The size of the software and its accuracy has a great effect on the estimation's accuracy.Project management also plays a vital role in the guidance of these estimation processes.
Much research has been carried out that reflects the rising demands of high-quality software through effective cost estimation [1] [2] [3] [4] [5].Software engineers have to apply the theories, tools and methods in a software project in order to solve a problem.However, they must also work within the financial constraints that were predefined.A vital issue which is closely related to a software project's financial aspects is the accurate estimation of the software cost involved.This helps to manage any software project as it means it will be within the set budget [6] [7] [8] [9].
Software cost estimation is a very challenging activity in the project management of software because predicting the cost is a difficult process at the early stage of the software's development [4] [10].Moreover, the estimation of the software's cost is impacted by many factors, including the implementation's efficiency, and the number of reviews and studies done prior to the development stage cost.There is clearly a strong relationship that exists between the estimation of software effort and software cost, as it can be said that the effort is the primary driving factor for the software's cost [11] [12] [13] [14].
It is important to state that the estimation of the software's cost is a continuous activity that begins at the proposal stage and then carries on throughout the life of the project.When project cost management is calculated, it includes the processes that are required to ensure the project is finished within the approved budget.The main processes include [15] [16]: • Estimating the costs (including top-down and bottom-up estimates, parametric modelling, etc.); • Determining the budget (the cost baseline); • Controlling the costs (Earned Value Management (EVM)).
Many approaches have been designed to address this software cost estimation process, which have been proposed by both scientists and researchers trying to create an accurate cost estimation technique that is accurate.The research work gives an extensive overview.It will address a total of five fundamental software cost estimation approaches, and a comparison will be made between the approaches based on evaluation criteria.This will then be thoroughly examined and used throughout the research work.
It is essential to point out that the novelties of this work include: studying important software cost estimation methods, setting out basic criteria (i.e., ease of use, adaptability, accuracy, consistency, interpretable, automatable, tool supported, empirical validations, sensitivity, and handling imprecision and uncertainty), comparing selected cost estimation methods based on these evaluation criteria and giving a score for each evaluation criteria in order to compare the different methods numerically for cost estimation.Moreover, this work provides the implementation for one of the well known software cost estimation models that indicate both the time and the effort required to complete a software project of a specific size.The paper will be structured as follows.Section 2 provides a list and summary of some of the existing approaches to software cost estimation.Section 3 lists the evaluation characteristics and Section 4 provides both discussions and comparison between the different approaches.Finally, Section 5 gives the conclusion and also future directions.

Software Cost Estimation Approaches
We will list and summarize some of the existing approaches to software cost estimation throughout this section.For each approach, we will also describe the mechanisms and features.We can divide Cost Estimation Techniques into two main broad categories.These are those that utilize the source lines of codes (SLOC) as their input and others that do not.

Approach 1: Constructive Cost Model (COCOMO)
It is considered that COCOMO is a very important model that can calculate a software cost estimate.This uses an algorithmic formula in order to estimate the software's cost [17].Therefore, this model is based on both mathematics and a number of experimental equations.Barry Bohem proposed it in 1981 for software cost estimation.It is considered to be the most complete approach and is better documented than the other cost estimation model, as is indicated in the references [18] [19] [20] [21].In addition, many of researchers in the software engineering field are now trying to increase the efficiency by keeping the COCOMO model's base.Furthermore, due to this model's simplicity, it is usually used for algorithmic cost estimation technique.COCOMO is made up of three models, which are Basic, Intermediate and Detailed.
The first model, which is the Basic one, is used as a function of the program size in the computing software effort and cost.It is primarily used for small to medium-sized software projects in order to perform a speedy, rough estimation.Basic COCOMO is, therefore, considered to be effective in circumstances where only a rough effort estimate is required.The equation for estimating the software effort for this basic model is: The SIZE is measured in this equation in a thousand delivered source instructions (KLOC, this is a thousand lines of code).Both of the coefficients, "a" and "b", are productivity coefficient, as well as the scale factor coefficient, respectively.It is vital to point out here that the value of the coefficients all depends on the modes of the project.Three different modes of the project proposed by Boehm are the Organic mode, the Semi-detached mode and the embedded mode.The first is organically utilized for small-sized projects of up to 2 -50 KLOC.The second is a semi-detached mode which is for medium-sized projects of up to 50 -300 KLOC.Thirdly, the embedded mode is for complex, large The equation that is used for estimating the software Effort for the intermediate model is [17]: In the equation, m(X) presents the effort adjustment factor and this is the product of a total of 15 Effort Multipliers.The third one (the detailed model) has two more capabilities.These are phase-sensitive effort multipliers and 3 level product hierarchies.The 3 levels are the Module, Subsystem and system and these are used to derive an accurate estimate.

Approach 2: Feed-Forward Neural Network with Principal Component Analysis (PCA)
The authors in [22] propose the reduction technique, which is called the feed-forward neural network with Principal Component Analysis (PCA).The authors' main objective is to use this in order to measure the accuracy of the software cost estimation model.The proposed technique is based on both algorithmic and non-algorithmic methods.So, they used a combination of algorithmic method (COCOMO) and non-algorithmic (Artificial Neural networks) to estimate the software project's costs.
In this paragraph, the authors will briefly address the Architecture Design of their System for Software Cost Estimation they have proposed.Its main parameters, which will be used as inputs for the proposed methods, are the size, cost factors and the scale factors.These parameters are all from the Actual Dataset that has been collected, as per the Project Specification.The second step is to apply the PCA.This is done by calculating the correlation coefficient matrix, as well as the Eigen-value of correlation coefficient matrix.The amount of principal components can be determined after that.These components are fed as input into the neural network system in order to train the dataset.The output layer then sends the size, effort multiplier and the scale factor values to COCOMOII.
From these inputs, which are sent from the neural network system COCOMOII, the software cost can be estimated.This result was based on the COCOMO sample dataset, which is widely used by researchers.It consists of over 161 historical projects collected from various countries all over the world.The results show that the Hybrid technique provides a more accurate cost estimation than those provided by the same type of algorithm when the PCA and neural network are not applied.
I. M. Keshta had developed which were based on his model.This is an empirical software effort estimation model and it is also one of the very first algorithmic cost models.

Approach 3: Putnam Model/SLIM
It describes both the time and the effort that is required to finish the software project of a specified size.Based on the Norden/Rayleigh function, it is generally known as the macro estimation model (as it is primarily used for large projects).
The Putnam model's software equation is given as [23] [24] [25]: The software equation in practical use is solved for effort when making an estimate for a software task [23] [24] [25]: ( ) The estimated size of the software when the project is completed and the productivity of the organisational process is used.The Time-Effort Curve is calculated by plotting the effort as a function of time and the estimated total effort that it takes to complete the project is represented by the points along the curve This method of estimating is quite sensitive to uncertainty in the size and productivity process.Putnam advocates getting this process productivity through calibration [23] [24] [25]: SLIM's Advantages • It utilizes linear programming in order to consider the development constraints of both the cost and effort required.
• One of the Putnam model's distinguishing features is that the total effort decreases as the time taken to finish the project extends.This is usually represented by a schedule relaxation parameter in other parametric models.
• SLIM needs fewer parameters in order to generate an estimate over both COCOMO'81 and COCOMO'II.

SLIM's Drawbacks
• This model is based on either knowing or being able to accurately estimate the size of the software (in the lines of code) to be developed.There is fre-Journal of Software Engineering and Applications quently a lot of uncertainty about the size of the software, which can result in the cost estimation being inaccurate.SLIM's error percentage is 772.87%[24], according to Kemerer's research, • This model is extremely sensitive to development time, as decreasing this can greatly increase the number of people and months that are required for development.
• It is not suitable for small projects.

Approach 4: Function Point Analysis
Define abbreviations and acronyms the first time they are used in the text, even after they have been defined in the abstract.Abbreviations such as IEEE, SI, MKS, CGS, sc, dc, and rms do not have to be defined.Do not use abbreviations in the title or heads unless they are unavoidable.
Algorithmic models, such as COCOMO, Putnam, etc., need the number of SLOC (source line of codes) to be estimated so as to get both the man-months and the duration estimates.Function Point Analysis is another method that can be used to quantify both a software system's size and complexity, in terms of which functions it is able to deliver to the user.Allan Albrecht at IBM developed the Function Points Measurement method, which was first published in 1976 [28] [29].Two steps are involved in counting the Function Points [24], which are: • Counting the various user functions • Making adjustments for processing the complexity Currently, the five user function categories are: external output types, external input types, external interface file types, logical internal file types and external inquiry types.It was recognized by Albrecht that the effort that is needed to provide a given level of functionality could vary, and this depended on environmental factors.For example, it is harder to input transactions to a program if much emphasis has been placed on either the system throughput or on end-user convenience.Therefore, Albrecht listed 14 processing complexity characteristics in response to this.These are to be rated on a scale that goes from 0 (which signifies no influence) up to 5 (meaning a strong influence).All the processing complexity points that have assigned are then summed up in the next step.This number is multiplied by 0.01.It is then added to 0.65 in order to obtain the following weighting: where PCA = processing complexity adjustment and then ci = complexity factors.
As a result, the various Function Points can vary ± 35 percent from the original Function Counts.Once they have been computed, these Function Points can be used in order to compare the size of the project that is proposed, compared with previous projects.

I. M. Keshta
There are a number of advantages of a function point analysis based model, and these are [30] [31] [32]: • The function points can be estimated from either the requirements specifications or the design specifications, which makes it possible to estimate the development costs in the development's early phases.• These function points are independent of the language, tools or methodologies that have been used for implementation.
• Non-technical users are able to obtain a better understanding of what the function points are measuring, as the function points have been based on the system user's own external view of the system.

Approach 5: Wavelet Neural Network (WNN)
The authors employed Wavelet Neural Network (WNN) in [33] to make an estimation of the software development effort.Experiments were made on two variants of WNN, the Morlet wavelet function [34], the Gaussian function, and a threshold acceptance training algorithm for wavelet neural network (TAWNN) that had been proposed.The results of these were compared with other computational intelligent methods, such as the Multilayer Perceptron (MLP) [35], the Radial Basis Function Network (RBFN) [36], the Multiple Linear Regression (MLR) [37], the Dynamic Evolving Neuro-Fuzzy Inference System (DENFIS) [38] and the Support Vector Machine (SVM) [39].All of these comparisons were computed in terms of the Mean Magnitude Relative Error "MMRE" applied on both the Canadian Financial "CF" dataset and the IBM Data Processing Services "IBMDPS" dataset.
The WNN that was used in this study is made up of a total of three interconnected layers.These are the input layer, the hidden layer and an output layer that has a single unit, as shown in Figure 1 acceptance is determined by deterministic criteria, instead of a probabilistic approach.
The idea is as follows: that the forward part of the back propagation remains undisturbed, while the back propagating of TA updates is done by making all of the weights a vector of decision variables.This TA concept was adopted by the authors in order to train the WNN, which is why it is called the Threshold Acceptance Wavelet Neural Network, or the TAWNN learning algorithm.This training algorithm's objective function is given as: ( ) The study's results demonstrate that the 4-models of WNN that are used in these experiments successfully produce better results compared to the other techniques.The mean magnitude of the relative error (MMRE) of both WNN-Morlet and WNN-Gaussian are successful, compared to both TAWNN-Morlet and TAWNN-Gaussian for the IBMDS and CF datasets.

Evaluation Characteristics
After describing the previous section's software cost estimation approaches, we will list our evaluation characteristics that we are going to use to compare them (see Table 1).

Ease of Use
This implies how simple it is to use and how easy it is to utilise a certain technique or approach.One fact that needs to be understood here is that the effort Table 1.Evaluation criteria.

Characteristic Brief Description
Ease of Use The approach used should be simple enough to be implemented in a reasonable time frame.

Adaptability
A method or a model should be adaptive to the changes otherwise the model will have limited usability.

Accuracy
Better accuracy implies better reliability.

Consistency
Consistency in results should be an important feature for any estimation model.

Interpretable
The results of the modeling technique have to be interpretable.

Automatable
It is desired to have a technique that could be substantially automated.

Tool supported "Supportability"
It is better if the proposed approach has been supported by an existing tool.

Empirical Validations
The empirical validation of a model adds to its credibility.

Sensitivity
In effort/time Estimation, it is desirable to have low sensitivity.

Handling Imprecision and Uncertainty
A model which considers imprecision is better than a model which doesn't.Journal of Software Engineering and Applications needed to estimate the cost of software development should be minimal.The approach used should preferably be simple enough to be done in a reasonable amount of time.If a software estimation approach uses a complex formula and algorithm, then the software cost estimation approach is said to have higher complexity and so might be undesirable.

Adaptability
The model's or method's ability to adjust to the new environment and fit the development practices' incremental style is called the adaptability of the model [38].It is important that a method or model can be adaptive to change.If it is not, its usability will be limited.An approach might be valid for a certain kind of project (like a small one) but not be applicable to other kinds, such as large projects.

Accuracy
The definition of accuracy is how close a result is to the correct value [41].The two ways to compare the result to its correct value are the difference and the ratio.In evaluating how accurate the software cost estimation models are, both the difference and ratio measures were used [42].Improved accuracy implies more reliability.Comparing the estimation accuracy of the various approaches is difficult for various reasons, like different datasets, divergent definitions of similar terms and differing goals of estimation accuracy [43].

Consistency
Models that have been developed in different environments require calibration to work well.To consistently overestimate or under estimate a model is not as difficult to calibrate as an inconsistent one.As well as accuracy, consistency is an important feature for estimation models [42].To measure the consistency level, there are some researchers who have used the correlation coefficient, SDR, between the observed and estimated values [44].

Interpretable
The modelling technique results all have to be interpretable.For example, if a modelling technique which produces hard-to-interpret results is identified as being the best one, it would not be a useful recommendation.This is because project managers would, in practice, be unlikely to apply a model that could not be understood.This excludes techniques like Artificial Neural Networks [40].

Automatable
As many techniques need intensive computation for accuracy, a technique that could be substantially automated [40] is desired.

Tools Supported or "Supportability"
Software cost estimation tools are able to improve accuracy by carrying out an I. M. Keshta automated calculation for the project.Tool-supported characteristics are able to point out if the proposed approach has a tool that supports it or not.If it is supported, then the major characteristic of this tool will be highlighted, such as its usability or efficiency [40] [42] [43].

Empirical Validations
A model's evaluation and validation or a general approach is vital.If the model can be validated, the criterion for validation and the dataset that it is validated on are considered.The industry's datasets are considered to be more reliable than the student datasets or those from open sources [43].The model's empirical validation also adds to its credibility.

Sensitivity
An input's receptiveness or responsiveness to an input stimulus is called sensitivity, and in software development, we call a sensitive model one where there is a change in an estimated effort with respect to a small change in the input values.It is desirable to have a low sensitivity in effort/time estimation.

Handling Both Imprecision and Uncertainty
It is common for all software development practices to take into account both the imprecisions and uncertainty that is associated with the processes.There is reasonable imprecision when estimating the software's size and much uncertainty in predicting the various factors that are associated with developing the software [43].A model that considers the factors is better than a model that does not.

Ease of Use
Ease of Use is an important criterion for evaluating the various approaches.This determines the degree of simplicity of a given approach, as it will try and answer how easy it is to utilise this approach.It is easy to use the COCOMO Model on small projects.However, it can be difficult to utilise it in large projects due to how complex these projects are and how many unknown variables there are that exist in these situation.Thus, this approach's score in this characteristic is 12 out of 15.For the hybrid approach (both algorithmic and non-algorithmic methods) proposed by the authors of [22], it not easy to use such an approach.This is because it is a hybrid approach which uses both an algorithmic method (COCOMO) and anon-algorithmic (Artificial Neural networks) in order to estimate the software project's cost.The score will be 8 out of a total of 15; as such an approach clearly requires a further step after using the COCOMO Model (which is the Artificial Neural networks algorithm).One of the earliest types of models developed is the Putnam model.It is amongst the ones most widely used and so is closely related to models like COCOMO.One of the model's key ad- vantages is the simplicity it is calibrated with.Most software organisations, regardless of maturity level, can easily collect the size, effort and duration (time) of past projects.It is, therefore, given 12 out of 15.A single unit's cost (in dollars or hours) from past projects is calculated by the Function Point.This method enables abstraction from a specific language, methodology or technology to take place and it is easier to understand and also to interpret for non-technical and external stakeholders, as well as users.The score for function point analysis is, therefore, 12 out of 15.Wavelet Neural network (WNN) has 5 out of 15 as it is very hard to use and contains many parameters that require a specialist person in order to calculate them.This score is clearly a result of the model's complexity.Its two main functions are the Morlet wavelet function and the Gaussian function that are used to perform this calculation.These functions require more sophisticated people to deal with them.

Adaptability
Adaptability is another important characteristic.This gives a degree of adaptability if the given approach can be adjusted, according to new changes and environments.The COCOMO Model is adaptive to the changes, particularly for little projects.In addition, we can just re-compute the values in the case of changes as a result of its mathematical model.It is also important to take the project's size into account as this can affect the COCOMO Model's adaptability.It, therefore, has a score of 8 out 10. Similarly, for [22], the model is given 8 out of 10 as it uses the COCOMO Model and takes into consideration the fact that Artificial Neural networks are fully adaptable.Software engineers look for a metric that should both be technology independent and support the need for estimating the project management, which measures the quality and gathering requirements.
The measure that accomplishes all of these tasks is Function Point Analysis, and so was given 8 out of 10.The requirement specification was not included in the Putnam model.It is not expected that an estimation which uses SLIM will take place until the design and coding has occurred.This is why it gets a score of 6 out of 10.The WNN was given 8 out of 10 as it is similar in behaviour and performance to the Artificial Neural Networks, but the adaptation process has some complexity because of the complexity of the neuron's internal function.

Accuracy
As long as we are feeding this model with almost correct values in terms of its accuracy, the COCOMO Model is very accurate, as this is a mathematical model.
But this model will produce the wrong result if we feed it with the wrong values for the variables.In the first case in our study, when we feed the COCOMO Model with almost the correct value for variables, the accuracy characteristic for the model is 12 out of 15.For [22], when the proposed approach uses the COCOMO Model, it proves to be accurate to a certain extent because the Artificial Neural networks are used.This might generate a wrong estimation if it is not I. M. Keshta done very well.We cannot guarantee 100% accuracy for ANN in many cases as it depends on the way that the construction phase, training phase, input value and numerous learning parameters are done.This approach, therefore, has a total of 8 out 15.The weakness of the Putnam is the same as the original COCOMO, in that SLOC has to be extrapolated in order for the software to be successfully implemented.This is a difficult thing to do, particularly at the start of the project.As a big section of the executable code is either reused or generated through standalone components or by middle ware in almost every modern system, it may be hard to get an acceptably accurate result using the method.
Therefore, its score is 10 out of 15.Function Point Analysis contains some mathematical equations, as was previously mentioned in the literature review, and function points counts could either increase or decrease by 35%, so it might not be 100% accurate.Hence, the score is 12 out of 15.WNN is not 100% accurate as it depends on the kind of datasets used.Therefore, this model is given the same score as ANN as it depends on the method of the construction phase, training phase, input value, and a lot of learning parameters.

Consistency
An important feature for any estimation model should be the consistency of its result.We will focus during our evaluation on whether we are able to determine the consistency level or not.The COCOMO Model, Rina et al. [22] and WNN approaches have a 10 out of 10 score, as we can determine the consistency level by utilising the correlation coefficient between the observed and estimated values.The Putnam Model has two kinds of equations and it follows a typical Rayleigh curve.The results are expected to be consistent and, therefore, it is given a score of 10.The results of the function point analysis will be consistent if it is based on an assumption that every one of the inputs is correct, but if different steps are used, the output could differ sometimes.Thus, it is given a score of 8 out of 10.

Interpretable
We will assign a score for this based on the results of the approach, which can be easily interpreted.The COCOMO Model gets 10 out of 10 as it produces numbers that everyone can both understand and interpret.It is, therefore, a useful approach.It is recommended that project managers apply such a model as it means they will be able to understand the outputs.For [22] and WNN, however, a score of 5/10 is given as these approaches, using ANN and WNN, can produce understandable outputs.Even the ANN model has some parameters which need a sophisticated person to explain.These results can be easily interpreted by using the different tools for the Putnam/SLIM model and, therefore, gets a score of 10.
The Function Point Analysis results are not as easy to interpret because here are various inputs and outputs to the system and so one user may interpret them differently from another, thus it is given a score of 6.

I. M. Keshta
compared.We will determine in our evaluation if a given model takes into consideration imprecision during the development of a software project.We can say that the COCOMO Model is a static model and has several variables that must be known before an estimation of the overall cost can be calculated.It also takes the software's size into consideration.However, the project might not correctly handle the issues of uncertainty if the model is large and complex because there will be unknown variables.Our score for both the COCOMO Model and Rina et al. [22] therefore approaches 2/5.The approach of Rina et al. [22]

Conclusions
Software cost estimation can be seen as essential activity that needs the utilization of both right methods and techniques in order to accomplish a good estimation of the results.This is why we studied several cost estimation approaches in this work and then evaluated and compared five of them.These five approaches It is important to note here that we introduced different evaluation characteristics (i.e., ease of use, adaptability, accuracy, consistency, interpretable, automatable, tool supported, empirical validations, sensitivity, and handling imprecision and uncertainty) in order to compare between these five software cost estimation approaches.
Our observations indicated that it is best to use a number of different estimating techniques or cost models for the project manager, and then compare the results, before determining the reasons for large variations and documenting any assumptions that were made while making the estimates.
[23]Putnam/SLIM estimating method was developed in the late 1970s by Larry Putnam of Quantitative Software Management, as is highlighted in the references[23][24] [25][26] [27].SLIM (Software Lifecycle Management) is the name Putnam gave to the proprietary suite of tools that his company QSM, Inc.
gets 2/5 as it uses the COCOMO Model.There was uncertainty in the Putnam/SLIM model's earlier versions as this was generated for the minimal time solution.Providing risk tables removed some of this and so it is given a score of 3/5.The uncertainty can be reduced if the various inputs to the function points are done with precision, but as this uncertainty exists, it gets a score of 3 out of 5.The score for WNN is 2/5 because it is a static model that primarily depends on the dataset types.It, therefore, cannot handle the issue of uncertainty issue during the different processes of software development.Shown in Table2is a summary comparison of the five fundamental software cost estimation approaches based on our defined evaluation criteria.

Table 2 .
Comparison between selected cost estimation approaches.Constructive Cost Model (CoCoMo), Feed-forward neural network with Principal Component Analysis (PCA), Putnam model/SLIM, Function Point Analysis and Wavelet Neural Network (WNN). are