3.2.3 Multivariable regression analysis applying variable transformations
Two transformations were then applied to the predictors (logarithmic and square root), to identify any linear correlation between the water level above the LiDAR level and the transformed variables. The predictors considered by this analysis were lake area, upstream drainage area, outflow channel width and peak discharge with return period of 350 years. In this context it was observed that squaring the discharge values made a relevant difference in the predictive performance of the model, while performing the same transformation on the area and watershed areas was not as significant. The weir value was not identified as a relevant variable. The simplest significant model used as predictors the lake area and the square root of the peak discharge, and shows a RMSE of 0.48 m with an adjusted R squared of 0.5495. The p-values for both the predictors are in the order of 10-5, showing a statistically significant correlation (Table 3, Figure 4).
[Table 3]
[Figure 4]
The error characteristics of this approach show the assumption of homoscedasticity to be valid in this model. Neither the error nor the absolute value of the error are shown to be linked with an increase of any of the predictors or other lake characteristics (plot in Figure 3 of supplementary materials).
An identical multivariable regression procedure was applied using return periods of 100 and 20 years. The results show that an equivalent model, with slightly modified coefficients, provides a good performance for a 100 year flow (RMSE = 0.48 m, adjusted R squared of 0.4849). However, the adjusted R squared dropped to 0.3574 for the 20-year events, indicating that the model was not able to predict water levels associated with more frequent events. We hypothesise that this is because higher frequency events are more readily controlled by engineered features leading to highly unpredictable water level behaviour.