Cross-validation across two distinct populations shows strong
performance.
We further examined the performance by simulating two distinct
populations and examined the ability of model extrapolation across
different cohorts. Both populations were simulated by the same approach
as described in the previous section. Then, we focused on each of the
parameters and changed this parameter through a grid search. In this
case, we used ExtraTreeRegressor, which is a representative machine
learning base learner.
The most important factor affecting results we observed was the
termination rates. When fixing the training set termination rate, the
best performance is achieved when the test population is most similar to
the training set, and deviates gradually when the two termination rates
differ (Fig. 4a, Fig. S6-7 ). For example, when the training set
average termination rate is 0.0008, the model achieved an error rate of
5.464% for both metrics when the test set termination rate is also
0.0008. The error rate becomes higher at both tails when the test set
termination error differs from training set termination error: when the
test set termination rate is 0.0002, the model achieved an error rate of
9.18% for absolute error and 9.29% for cumulative error. When the test
set termination rate is 0.0012, the model achieved an error rate of
18.82% for both absolute error and cumulative error. This observation
is expected, as if the termination rates of the two populations differ
too much, and corresponding feature distributions (derived from the
termination rate) do not overlap between the two populations, then it
would be challenging to predict the patterns. Nevertheless, the error is
much lower than directly using the training curve, for which we would
expect a 50% error when trained with 0.0008 termination rate and tested
with 0.0012 termination rate.