3.1.3 Modeling
By preparing data in an appropriate format, one can create a model for
analyzing and making accurate predictions. First, the original dataset
should be divided into training and test sets by specific methods. In
the modeling step, the training dataset enters an algorithm, and the
algorithm uses statistical and mathematical tools, which are called
estimators, to learn and develop predictions. In ML, the process of
creating the desired model is called training. Usually, the most
challenging part of ML is choosing the right estimator based on
different types of data and problems. The scikit-learn library provides
a broad range of estimators along with a procedure on how to select the
best estimator among them (www.scikit-learn.org). Some important machine
learning estimators including linear regression, k-nearest neighbors,
support vector machines (SVMs), decision trees, random forest, Gaussian
process (GP), fuzzy logic, and artificial neural networks (ANNs) are
defined in Table 2 . Moreover, ensemble methods aim to fuse the
prediction of several single estimators to improve the precision and
accuracy of the model [85]. For example, the random forest is an
ensemble modeling method in which several decision trees are used to
predict the outcome [86]. It is worth noting that each estimator has
its tunable parameters depending on the type of data and the problem.