Previous Studies
Bilgaiyan et al. (Bilgaiyan et al., 2017) carried out a systematic
survey of the literature on effort and cost estimation techniques for
software development. The researchers concentrated on studying several
development models to reach an accurate measurement of the effort and
cost of different software projects based on different development
models that can accurately predict delay and calculate the percentage of
misbehaviour in total effort, cost estimation, delivery time forecast,
and budget. This helped projects adapt to changes in a software project
and reach a development model where the customer becomes an active
participant in the development thus changes can occur at any stage of
development and can be dynamic. Accurate measurements of the effort and
cost were able to solve the problems and make the project more flexible
through the application of the genetic algorithm (GA), the improvement
of particle swarming (PSO), the artificial neural network (ANN), and the
fuzzy inference systems (FIS).
Zadeh & Kashef (Zadeh & Kashef, 2022) aimed to help the company’s
executives and management team provide more realistic time and cost
estimates for future software projects by examining the link between
project complexity and cost/time overrun. The Changepoint database was
used to collect sample data for around 50 projects. Statistical
approaches were used to define and test the two study hypotheses.
Descriptive analysis and regression modelling are part of the
quantitative analysis. The findings of the experiments revealed that
there was a strong positive linear link between project complexity and
cost/time overrun.
Yoon et al. (Yoon et al., 2007) proposed a new effort estimation
methodology aimed at agile and iterative development environments that
were not suitable for description by traditional prediction methods. The
study proposed a detailed development methodology that discussed several
structures of these models and also a large group of augmented
regression models and neural networks based on machine learning and
included a comprehensive case study for Extreme Programming (XP) in two
semi-industrial projects. The results showed that the proposed stepwise
model outperformed traditional estimation techniques significantly in
the early stages of development.
Sharma & Singh (Sharma & Singh, 2017) presented a systematic review of
software effort estimation techniques using machine learning. For
software effort estimation, the most common machine learning algorithms
are Artificial Neural Networks, Fuzzy Logic, Genetic Algorithms, and
Regression Trees. The results revealed that the most common methods for
effort estimation, are Line of Code (LOC) and Function Point (FP) for
software metrics.
Choetkiertikul et al. (Choetkiertikul et al., 2019) presented a
prediction model for estimating and evaluating projects based on a novel
combination of two powerful deep-learning architectures: long-term
memory and recurrent highway network. The researchers proposed a
comprehensive dataset for effort-based estimation and evaluation that
studied 313 releases from 16 open-source projects. It also proposed a
comprehensive prediction system based on deep learning for effort
estimation. The results revealed that a large part of these improvements
has been made possible by the use of deep learning architecture LSTM for
modelling textual descriptions.
Kula et al. (Kula et al., 2022) reviewed the problems of late delivery
of projects and the resulting increase in costs and showed that it was a
problem resulting from the failure to estimate effort during project
planning. Despite the complex systems and technologies involved in
software projects, they can be affected by many factors that affect
effort estimation and timely delivery. The researchers also identified
the factors that affected schedule deviations using a multi-method case
study in ING that revealed many organizational, personnel, process, and
technical factors. The researchers then structured the findings into a
conceptual framework representing the influencing factors and their
relationships to on-time delivery. The proposed framework identified and
managed delays and risks and designed automated tools to predict
schedule overruns and manage a software project.
Agrawal & Chari (Agrawal & Chari, 2007) focused on multitasking
projects to examine the effects of very large processes on effort,
quality, and cycle time. Using a linear regression model based on data
collected from 37 projects and focusing on program size to determine
effort, cycle time, and quality on average, a set of developed models
was designed that predicted effort and cycle time to be about 12 percent
and defects to about 49 percent of the actual values across
organizations. It compared favourably with widely used estimation
software models such as FPs and COCOMO. The results showed a sharp
decrease in variance in the effort, quality, and cycle time which led to
relative uniformity in the effort, cycle time, and quality.
Bhattacharya & Neamtiu (Bhattacharya & Neamtiu, 2011) proposed models
for predicting the time to fix errors that used different features to
report errors such as the number of developers who participated in
fixing the error, the severity of the error, and the number of
corrections. The time it would take to fix was estimated, and from it,
the study was able to build more accurate and more general models to
predict the time to fix errors by using multivariate and univariate
regression testing to test the pre-interpretation significance of the
existing models. The researcher presented a case study of a bug report
from five open-source projects: Eclipse, Chrome, and three Mozilla
project products (Firefox, Seamonkey, and Thunderbird). The results
revealed three unresolved research issues: (1) determining if
prioritising bugs based on bug-opener reputation is advantageous, (2)
defining features that are useful in forecasting bug-fix time, and (3)
developing bug-fix time prediction models that can be evaluated on
real-world data.
Grimstad (Grimstad, (2005, September)) addressed the issue of improving
program cost estimates based on expert judgments and cost uncertainty
assessments through better processes, support processes, and better
learning/training processes. The emphasis was on expert judgment in most
software costing exercises and improving understanding of the causes of
estimation inaccuracies in software development projects. It included
understanding the impact of the activities and phenomena that occurred
before, during, and after the actual development project and examining
how personal experiences affected the estimator when deciding to grade.
Trendowicz et al. (Trendowicz et al., 2014) proposed an integrated
approach to selecting relevant factors affecting software development
productivity. Evaluation of Effort and Delay Factors was used to
identify the most relevant factors affecting software development
productivity by incorporating data analysis and an expert judgment
approach via a multi-criteria decision support technique. The study was
conducted using a process where the researchers presented a different
set of factors compared to individual data and expert-based factor
selection methods. The results showed an improvement in the performance
of effort estimation in terms of accuracy and an improvement in the
estimation performance on the groups of factors that were reduced by the
data-based selection method. It was conducted that expert and data-based
selection methods identified different (only partially overlapping)
combinations of relevant factors.
Abdalkareem et al. (Abdalkareem et al., 2021) addressed the impact of
commits on project delays, where it examined the commits of 58 Java
projects and identified the commits that were explicitly skipped by
developers through manual investigation of 1813 explicit commits and
proposed a prototype of the commit that can be Skip CI. The model used a
rule-based technology that automatically identified which commits to
skip by focusing on unseen datasets extracted from ten projects and
demonstrated that the technology can detect and label skip commits CI.
The number of commits that are needed to run the CI process can be
reduced by 18.16%. A publicly available prototype tool called
CI-SKIPPER has been developed that can integrate with any git repository
and automatically flag commits that can be skipped.
Lebedeva & Guseva (Lebedeva & Guseva, 2020) dealt with the factors of
late delivery of software projects and cost overruns in the software
industry due to deficiencies in estimating effort during project
planning as it affected the estimation of effort and delivery on time,
which affected schedule deviations in software development. A
multi-method case study in ING revealed many organizational, personnel,
process, project, and technical factors that were then quantified and
statistically modelled using software repository data from 185 teams.
Focusing on agent metrics such as project size, number of dependencies,
historical delivery performance, and team knowledge, it also addressed
hierarchical interactions between factors, which in turn influence
technical factors. From it, a conceptual framework was reached that
represented the influencing factors and their relationships to delivery
on time. From it, it was possible to access the identification and
management of delay risks. The researchers also designed automated tools
to predict schedule overruns and developed a relational theory for
software project management.
Chang et al. (Chang et al., 2020) proposed an automatic code review tool
using static analysis with quality evaluation metrics designed
systematically under the GQM methodology and evaluating the three
software quality characteristics of the ISO/IEC52060 standard for
intelligent applications. It was an automatic code review tool for
applications built on an open platform for personal use or public
distribution, as it reviewed the code responsible for detecting
violations of coding standards and ensuring that best practices were
followed. This tool developed a code review model and automatic quality
analysis without the need for human intervention to monitor delay
elements and interpret their cause and proposal for treatment.
A few gaps can be found in the extensive corpus of research on cost
estimate models and their relevance to project delays and cost overruns.
These gaps point to areas where more investigation and learning can
advance our comprehension of the subject. Although cost estimation
models can be used in a variety of businesses, certain features and
difficulties may exist in other areas that call for cost estimation
techniques unique to that industry. Despite the abundance of cost
estimation models available, a more thorough quantitative assessment and
comparison of these models concerning their precision, dependability,
and suitability is required.