Previous Studies
Bilgaiyan et al. (Bilgaiyan et al., 2017) carried out a systematic survey of the literature on effort and cost estimation techniques for software development. The researchers concentrated on studying several development models to reach an accurate measurement of the effort and cost of different software projects based on different development models that can accurately predict delay and calculate the percentage of misbehaviour in total effort, cost estimation, delivery time forecast, and budget. This helped projects adapt to changes in a software project and reach a development model where the customer becomes an active participant in the development thus changes can occur at any stage of development and can be dynamic. Accurate measurements of the effort and cost were able to solve the problems and make the project more flexible through the application of the genetic algorithm (GA), the improvement of particle swarming (PSO), the artificial neural network (ANN), and the fuzzy inference systems (FIS).
Zadeh & Kashef (Zadeh & Kashef, 2022) aimed to help the company’s executives and management team provide more realistic time and cost estimates for future software projects by examining the link between project complexity and cost/time overrun. The Changepoint database was used to collect sample data for around 50 projects. Statistical approaches were used to define and test the two study hypotheses. Descriptive analysis and regression modelling are part of the quantitative analysis. The findings of the experiments revealed that there was a strong positive linear link between project complexity and cost/time overrun.
Yoon et al. (Yoon et al., 2007) proposed a new effort estimation methodology aimed at agile and iterative development environments that were not suitable for description by traditional prediction methods. The study proposed a detailed development methodology that discussed several structures of these models and also a large group of augmented regression models and neural networks based on machine learning and included a comprehensive case study for Extreme Programming (XP) in two semi-industrial projects. The results showed that the proposed stepwise model outperformed traditional estimation techniques significantly in the early stages of development.
Sharma & Singh (Sharma & Singh, 2017) presented a systematic review of software effort estimation techniques using machine learning. For software effort estimation, the most common machine learning algorithms are Artificial Neural Networks, Fuzzy Logic, Genetic Algorithms, and Regression Trees. The results revealed that the most common methods for effort estimation, are Line of Code (LOC) and Function Point (FP) for software metrics.
Choetkiertikul et al. (Choetkiertikul et al., 2019) presented a prediction model for estimating and evaluating projects based on a novel combination of two powerful deep-learning architectures: long-term memory and recurrent highway network. The researchers proposed a comprehensive dataset for effort-based estimation and evaluation that studied 313 releases from 16 open-source projects. It also proposed a comprehensive prediction system based on deep learning for effort estimation. The results revealed that a large part of these improvements has been made possible by the use of deep learning architecture LSTM for modelling textual descriptions.
Kula et al. (Kula et al., 2022) reviewed the problems of late delivery of projects and the resulting increase in costs and showed that it was a problem resulting from the failure to estimate effort during project planning. Despite the complex systems and technologies involved in software projects, they can be affected by many factors that affect effort estimation and timely delivery. The researchers also identified the factors that affected schedule deviations using a multi-method case study in ING that revealed many organizational, personnel, process, and technical factors. The researchers then structured the findings into a conceptual framework representing the influencing factors and their relationships to on-time delivery. The proposed framework identified and managed delays and risks and designed automated tools to predict schedule overruns and manage a software project.
Agrawal & Chari (Agrawal & Chari, 2007) focused on multitasking projects to examine the effects of very large processes on effort, quality, and cycle time. Using a linear regression model based on data collected from 37 projects and focusing on program size to determine effort, cycle time, and quality on average, a set of developed models was designed that predicted effort and cycle time to be about 12 percent and defects to about 49 percent of the actual values across organizations. It compared favourably with widely used estimation software models such as FPs and COCOMO. The results showed a sharp decrease in variance in the effort, quality, and cycle time which led to relative uniformity in the effort, cycle time, and quality.
Bhattacharya & Neamtiu (Bhattacharya & Neamtiu, 2011) proposed models for predicting the time to fix errors that used different features to report errors such as the number of developers who participated in fixing the error, the severity of the error, and the number of corrections. The time it would take to fix was estimated, and from it, the study was able to build more accurate and more general models to predict the time to fix errors by using multivariate and univariate regression testing to test the pre-interpretation significance of the existing models. The researcher presented a case study of a bug report from five open-source projects: Eclipse, Chrome, and three Mozilla project products (Firefox, Seamonkey, and Thunderbird). The results revealed three unresolved research issues: (1) determining if prioritising bugs based on bug-opener reputation is advantageous, (2) defining features that are useful in forecasting bug-fix time, and (3) developing bug-fix time prediction models that can be evaluated on real-world data.
Grimstad (Grimstad, (2005, September)) addressed the issue of improving program cost estimates based on expert judgments and cost uncertainty assessments through better processes, support processes, and better learning/training processes. The emphasis was on expert judgment in most software costing exercises and improving understanding of the causes of estimation inaccuracies in software development projects. It included understanding the impact of the activities and phenomena that occurred before, during, and after the actual development project and examining how personal experiences affected the estimator when deciding to grade.
Trendowicz et al. (Trendowicz et al., 2014) proposed an integrated approach to selecting relevant factors affecting software development productivity. Evaluation of Effort and Delay Factors was used to identify the most relevant factors affecting software development productivity by incorporating data analysis and an expert judgment approach via a multi-criteria decision support technique. The study was conducted using a process where the researchers presented a different set of factors compared to individual data and expert-based factor selection methods. The results showed an improvement in the performance of effort estimation in terms of accuracy and an improvement in the estimation performance on the groups of factors that were reduced by the data-based selection method. It was conducted that expert and data-based selection methods identified different (only partially overlapping) combinations of relevant factors.
Abdalkareem et al. (Abdalkareem et al., 2021) addressed the impact of commits on project delays, where it examined the commits of 58 Java projects and identified the commits that were explicitly skipped by developers through manual investigation of 1813 explicit commits and proposed a prototype of the commit that can be Skip CI. The model used a rule-based technology that automatically identified which commits to skip by focusing on unseen datasets extracted from ten projects and demonstrated that the technology can detect and label skip commits CI. The number of commits that are needed to run the CI process can be reduced by 18.16%. A publicly available prototype tool called CI-SKIPPER has been developed that can integrate with any git repository and automatically flag commits that can be skipped.
Lebedeva & Guseva (Lebedeva & Guseva, 2020) dealt with the factors of late delivery of software projects and cost overruns in the software industry due to deficiencies in estimating effort during project planning as it affected the estimation of effort and delivery on time, which affected schedule deviations in software development. A multi-method case study in ING revealed many organizational, personnel, process, project, and technical factors that were then quantified and statistically modelled using software repository data from 185 teams. Focusing on agent metrics such as project size, number of dependencies, historical delivery performance, and team knowledge, it also addressed hierarchical interactions between factors, which in turn influence technical factors. From it, a conceptual framework was reached that represented the influencing factors and their relationships to delivery on time. From it, it was possible to access the identification and management of delay risks. The researchers also designed automated tools to predict schedule overruns and developed a relational theory for software project management.
Chang et al. (Chang et al., 2020) proposed an automatic code review tool using static analysis with quality evaluation metrics designed systematically under the GQM methodology and evaluating the three software quality characteristics of the ISO/IEC52060 standard for intelligent applications. It was an automatic code review tool for applications built on an open platform for personal use or public distribution, as it reviewed the code responsible for detecting violations of coding standards and ensuring that best practices were followed. This tool developed a code review model and automatic quality analysis without the need for human intervention to monitor delay elements and interpret their cause and proposal for treatment.
A few gaps can be found in the extensive corpus of research on cost estimate models and their relevance to project delays and cost overruns. These gaps point to areas where more investigation and learning can advance our comprehension of the subject. Although cost estimation models can be used in a variety of businesses, certain features and difficulties may exist in other areas that call for cost estimation techniques unique to that industry. Despite the abundance of cost estimation models available, a more thorough quantitative assessment and comparison of these models concerning their precision, dependability, and suitability is required.