2. DATA AND METHODS
2.1 Study area
Beijing (115°25′-117°30′ E, 39°28′-41°05′ N), a typical megacity, is located in northern of North China Plain (Fig.1). It covers an area of 16.4 thousand km2. The climate of Beijing belongs to temperature continental monsoon climate. The annual precipitation is 600 mm, and 80% of precipitation is concentrated in summer. The western and northern of the study area lie on mountains, and the central city lies largely on flat land. The main rivers are the Yongding River, Chaobai River, Jvma River, Beiyun River, and Jiyun River. Beijing has witnessed rapid urban sprawl in the horizontal and vertical dimensions since the reform and “opening up” policy. The urban built-up area increased 3130 km2 from 1980 to 2018 (Yin et al., 2021). The oldest part of the city is mainly composed of ancient buildings with low building height within the second ring road, such as Sihe yuan. In 2020, more than 25 buildings were taller than 200 m. Beijing has undergone serious urban flooding. Although a large-scale drainage system was constructed, the city still faces the challenge of frequent urban flooding. For example, 79 people were killed and more than $1.6 billion were lost in Beijing on July 21, 2012. With the renewal of drainage facilities and the development of sponge city construction, a heavy rainstorm caused 37 road collapses within the whole city on July 16, 2018. Based on previous studies (Liu et al., 2022), highly urbanized areas are the most vulnerable area to flooding. Therefore, it is still necessary to mitigation flooding risks through urban planning and management in highly urbanized areas.
2.2 Data sources
A total of 139 flood points from 2011 to 2021 were collected from the Water Resources Bureau of Beijing (Table 1). However, the available data only provide the address of the urban flooding events without accurate coordinates. Thus, we first vectorized these positions according to the occurrence types. When the urban flooding event occurred on the road, we determined the geometric center of the road as the position following the method recommended by Zhang et al. (2018) (Fig.2). Building footprint and floor number information in 2017 were collected from Baidu Maps (map.baidu.com) (Table 1). By referring to the Gaofen-1 image (chromatic sensor with 2 m resolution), the location of the buildings is highly accurate. The building floor height was set 3 m through the building design regulations. The overall accuracy of building height was beyond 85% according to field surveys, which is satisfying. The digital elevation model (DEM) with a resolution of 12.5 m was acquired from https://search.asf.alaska.edu/. The land use data in 2020 with a resolution of 10 m was produced by European Space Agency (ESA) (https://viewer.esa-worldcover.org/worldcover) (Fig.2). Then, the land use types were integrated into four categories: impervious surface, green space, waterbody, and farmland. The landscape indices were calculated by Fragstats 4.2 based on the land use data. In addition, other potential explanatory factors of rainfall and drainage capacities were utilized as independent variables to explain the urban flooding (Li et al., 2022). The average rainfall was calculated by the annual precipitation data obtained from the National Earth System Science Data Center (http://www.geodata.cn/). Due to lack of drainage system, distances to river and road density were used to represent urban drainage capacity (Li et al., 2022). The river data and road data were acquired from the National Geomatics Center of China (http://www. ngcc.cn/).
2.3 Subwatershed unit
Urban hydrology has complex natural-artificial features. Urban flooding was closely related to the watershed surface characteristics, such as topography, land cover and land configuration. Thus, it is suitable to explore the relationship between urban flooding and explanatory factors from the perspective of the subwatershed unit in considering urban hydrological processes. Previous studies have proposed the concept frame of “watershed-land” in the field of urban hydro-ecology, fully considering the hydrological watershed unit and easy connection with urban planning (Yu et al., 2018). Following the method recommended by Yu et al. (2018), the DEM data was utilized to divide the subwatershed units using the D8 algorithm. Finally, the study area was divided into 130 subwatershed units with an average area of 15.7 km2 (Fig.1).
2.4 Subwatershed unit
Local Morman’s index in ArcGIS 10.3 was used to identify the distribution patterns of urban flooding points. The cluster types can be mainly divided into high-high agglomeration, low-low agglomeration, high-low agglomeration, and low-high agglomeration. The formula is as follows:
\begin{equation} I=n\times\frac{(x_{i}-\overset{\overline{}}{x})\times\sum W_{\text{ij}}(x_{i}-\overset{\overline{}}{x})}{\sum{(x_{i}-\overset{\overline{}}{x}\ )}^{2}}\nonumber \\ \end{equation}
Where, \(I\) is the local Morman’s index; \(x_{i}\) and\(\overset{\overline{}}{x}\) are the observation and the average of all observations, respectively.
2.5 Landscape metrics
Numerous landscape metrics have been developed to measure landscape composition and configuration. Considering the ecological meaning and previous researches (Li et al., 2022a; Lin et al., 2021), five landscape metrics were selected, including: percentage of green space, percentage of water body, percentage of impervious surfaces, patch density, and landscape shape index (Table 2). For the three-dimensional building metrics, we adopted the building coverage ratio (BCR), building density (BD), building congestion degree (BCD), building height (BH), floor area ratio (FAR), 3D shape index (3DSI), and 3D fractals (3DF). The BCR, BH, and FAR were the most commonly adopted indicators in urban planning (Liu et al., 2021). 3DSI and 3DF were comprehensive indicators, which were based on the surface area and volume of buildings.
2.6 The boosted regression tree (BRT) analysis
The density of flooding hotspots was chosen as the response variable. First, we conducted Pearson correlation analysis to examine the relationship between the density of flooding hotspots and the explanatory factors. Before BRT analysis, Pearson correlation analysis was conducted between the explanatory factors. Pearson correlation analysis was effective to exclude multicollinearity between explanatory factors. Pearson correlation coefficient ranges from -1 to 1, indicating a positive/negative relationship and the strength.
The BRT has been widely applied in land science and urban heat island researches in recent years for its superior performance in explaining the complex relationship between response variable and explanatory factors (Sun et al., 2020; Elith et al., 2008). A BRT uses recursive binary splitting to generate a regression tree algorithm and then uses boosting to obtain the nonlinear relationship between the response and its predictor variables. When the correlation coefficient between two explanatory factors was ≥0.7, one of the predictor variable was removed. Finally, the percentage of impervious surface, population density, and floor area ratio were removed. Additionally, the BRT model also outputs partial dependency plots, which can concisely explain the marginal effects of the explanatory factors. The marginal effects represent the potential impact of the specific factor with other explanatory factors holds constant. With the BRTs, the contributions of the selected indicator were determined by the coefficients. The sum of the relative contribution was 100%. Development of the BRT models was based on dismo package of the R software. The main parameters of the BRT were set to 0.0001 (learning rate), 0.6 (bag fraction), 10 (tree complexity), and 10-fold cross validation.
2.7 Geographical detector model
The geographical detector model (GeoD) could identify the interactive effect between variables, which is a spatial analysis model based on the theory of spatial heterogeneity (Song et al., 2020). It assesses the strength of association between two variables by comparing the association which is definited as 𝑞, which ranging from 0 to 1. A large value in 𝑞, indicates the explanatory power is stronger. The formula is as follows:
\begin{equation} q=1-\frac{1}{N\sigma^{2}}\sum_{h=1}^{L}{N_{h}\sigma_{h}^{2}}\nonumber \\ \end{equation}
Where, \(\sigma^{2}\ \)is the variance of the response variables; 𝑁 is the number of category of the response variables; ℎ1……𝐿 is the study population of the response variables.
The 𝑞 (𝑋1\(\cap\)𝑋2), which represents the interaction between the factors (𝑋1, 𝑋2 … ), can be calculated by the interaction detector. The interaction effect can be divided into 5 categories, including: independent, nonlinear weaken (NW), nonlinear enhancement (NE), single-factor nonlinear weaken (SNW), double-factor enhancement (DE), and independent (I) (Yan et al., 2021). Additionally, the model requires category variables. Here, K-means method is used to convert the response and explanatory variables into category variables. Development of the GeoD models was based on the R-software, “GD 1.10” package.