3.4.3. Random Forest
Introduced by [58], Random Forests is a statistical supervised
machine learning techniques that us used for both regression and
classification. This is an ensemble learning technique that uses an
averaged combination of many decision trees for the final prediction.
The technique of averaging a statistical machine learning model is
called bagging and it improves stability and avoids overfitting
[59]. Normally, decision trees are not that competitive to the
best-supervised learning approaches in terms of prediction accuracy
since they tend to have high variance and low bias. This is because
building two different decision trees can yield in two different trees.
Bagging is therefore well suited for decision tress since it reduces the
variance. The idea behind Random Forests is to draw B bootstrap samples
from the training data set and then build several different decision
trees on the B different training samples. The reason why this method is
called Random Forests is that it chooses random input variables before
every split when building each tree. By doing this, each tree will have
a reduced covariance which in turn will lower the overall variance even
further [59]. A random forest ensemble algorithm was created with
100 combined trees. The batch size was selected as 10 and the depth of
the trees was set to unlimited. Other metrics for the random forest
algorithm is given in Table 4.