2. DATA AND METHODS
2.1 Study area
Beijing (115°25′-117°30′ E,
39°28′-41°05′ N), a typical megacity, is located in northern of North
China Plain (Fig.1). It covers an area of 16.4 thousand
km2. The climate of Beijing belongs to temperature
continental monsoon climate. The annual
precipitation is 600 mm, and 80%
of precipitation is concentrated in summer. The western and northern of
the study area lie on mountains, and the central city lies largely on
flat land. The main rivers are the Yongding River, Chaobai River, Jvma
River, Beiyun River, and Jiyun River. Beijing has witnessed rapid urban
sprawl in the horizontal and vertical dimensions since the reform and
“opening up” policy. The urban built-up area increased 3130
km2 from 1980 to 2018 (Yin et al., 2021). The oldest
part of the city is mainly composed of ancient buildings with low
building height within the second ring road, such as Sihe yuan. In 2020,
more than 25 buildings were taller than 200 m. Beijing has undergone
serious urban flooding. Although a large-scale drainage system was
constructed, the city still faces the challenge of frequent urban
flooding. For example, 79 people were killed and more than $1.6 billion
were lost in Beijing on July 21, 2012. With the renewal of drainage
facilities and the development of sponge city construction, a heavy
rainstorm caused 37 road collapses within the whole city on July 16,
2018. Based on previous studies (Liu et al., 2022), highly urbanized
areas are the most vulnerable area to flooding. Therefore, it is still
necessary to mitigation flooding risks through urban planning and
management in highly urbanized areas.
2.2 Data
sources
A total of 139 flood points from 2011 to 2021 were collected from the
Water Resources Bureau of Beijing (Table 1). However, the available data
only provide the address of the urban flooding events without accurate
coordinates. Thus, we first vectorized these positions according to the
occurrence types. When the urban flooding event occurred on the road, we
determined the geometric center of the road as the position following
the method recommended by Zhang et al. (2018) (Fig.2). Building
footprint and floor number information in 2017 were collected from Baidu
Maps (map.baidu.com) (Table 1). By
referring to the Gaofen-1 image (chromatic sensor with 2 m resolution),
the location of the buildings is
highly accurate. The building floor height was set 3 m through the
building design regulations. The overall accuracy of building height was
beyond 85% according to field surveys, which is satisfying. The digital
elevation model (DEM) with a resolution of 12.5 m was acquired from
https://search.asf.alaska.edu/. The land use data in 2020 with a
resolution of 10 m was produced by European Space Agency (ESA)
(https://viewer.esa-worldcover.org/worldcover)
(Fig.2). Then, the land use types
were integrated into four categories:
impervious surface, green space,
waterbody, and farmland. The landscape indices were calculated by
Fragstats 4.2 based on the land use data. In addition, other potential
explanatory factors of rainfall and drainage capacities were utilized as
independent variables to explain the urban flooding (Li et al., 2022).
The average rainfall was calculated by the annual precipitation data
obtained from the National Earth System Science Data Center
(http://www.geodata.cn/). Due to lack of drainage system, distances to
river and road density were used to represent urban drainage capacity
(Li et al., 2022). The river data and road data were acquired from the
National Geomatics Center of China (http://www. ngcc.cn/).
2.3 Subwatershed
unit
Urban hydrology has complex
natural-artificial features. Urban flooding was closely related to the
watershed surface characteristics, such as topography, land cover and
land configuration. Thus, it is suitable to explore the relationship
between urban flooding and explanatory factors from the perspective of
the subwatershed unit in considering urban hydrological processes.
Previous studies have proposed the concept frame of “watershed-land”
in the field of urban hydro-ecology, fully considering the hydrological
watershed unit and easy connection with urban planning (Yu et al.,
2018). Following the method recommended by Yu et al. (2018), the DEM
data was utilized to divide the subwatershed units using the D8
algorithm. Finally, the study area was divided into 130 subwatershed
units with an average area of 15.7 km2 (Fig.1).
2.4 Subwatershed unit
Local Morman’s index in ArcGIS 10.3 was used to identify the
distribution patterns of urban flooding points. The cluster types can be
mainly divided into high-high agglomeration, low-low agglomeration,
high-low agglomeration, and low-high agglomeration. The formula is as
follows:
\begin{equation}
I=n\times\frac{(x_{i}-\overset{\overline{}}{x})\times\sum W_{\text{ij}}(x_{i}-\overset{\overline{}}{x})}{\sum{(x_{i}-\overset{\overline{}}{x}\ )}^{2}}\nonumber \\
\end{equation}Where, \(I\) is the local Morman’s index; \(x_{i}\) and\(\overset{\overline{}}{x}\) are the observation and the average of all
observations, respectively.
2.5 Landscape
metrics
Numerous landscape metrics have been developed to measure landscape
composition and configuration. Considering the ecological meaning and
previous researches (Li et al., 2022a; Lin et al., 2021), five landscape
metrics were selected, including: percentage of green space, percentage
of water body, percentage of impervious surfaces,
patch density, and landscape
shape index (Table 2). For the
three-dimensional building metrics, we adopted the building coverage
ratio (BCR), building density (BD), building congestion degree (BCD),
building height (BH), floor area ratio (FAR), 3D shape index (3DSI), and
3D fractals (3DF). The BCR, BH, and FAR were the most commonly adopted
indicators in urban planning (Liu et al., 2021). 3DSI and 3DF were
comprehensive indicators, which were based on the surface area and
volume of buildings.
2.6 The boosted regression tree (BRT) analysis
The density of flooding hotspots
was chosen as the response
variable. First, we conducted Pearson correlation analysis to examine
the relationship between the density of flooding hotspots and the
explanatory factors. Before BRT
analysis, Pearson correlation analysis was conducted between the
explanatory factors. Pearson correlation analysis was effective to
exclude multicollinearity between explanatory factors. Pearson
correlation coefficient ranges from -1 to 1, indicating a
positive/negative relationship and the strength.
The BRT has been widely applied in land science and urban heat island
researches in recent years for its superior performance in explaining
the complex relationship between response
variable and explanatory factors
(Sun et al., 2020; Elith et al., 2008). A BRT uses recursive binary
splitting to generate a regression tree algorithm and then uses boosting
to obtain the nonlinear relationship between the response and its
predictor variables. When the correlation coefficient between two
explanatory factors was ≥0.7, one of the predictor variable was removed.
Finally, the percentage of impervious surface, population density, and
floor area ratio were removed. Additionally, the BRT model also outputs
partial dependency plots, which can concisely explain the marginal
effects of the explanatory factors. The marginal effects represent the
potential impact of the specific factor with other explanatory factors
holds constant. With the BRTs, the contributions of the selected
indicator were determined by the coefficients. The sum of the relative
contribution was 100%. Development of the BRT models was based on dismo
package of the R software. The main parameters of the BRT were set to
0.0001 (learning rate), 0.6 (bag fraction), 10 (tree complexity), and
10-fold cross validation.
2.7 Geographical detector model
The geographical detector model (GeoD) could identify the
interactive effect between
variables, which is a spatial analysis model based on the theory of
spatial heterogeneity (Song et al., 2020). It assesses the strength of
association between two variables by comparing the association which is
definited as 𝑞, which ranging from 0 to 1. A large value in 𝑞, indicates
the explanatory power is stronger.
The formula is as follows:
\begin{equation}
q=1-\frac{1}{N\sigma^{2}}\sum_{h=1}^{L}{N_{h}\sigma_{h}^{2}}\nonumber \\
\end{equation}Where, \(\sigma^{2}\ \)is the variance of the response variables; 𝑁 is
the number of category of the response variables; ℎ1……𝐿 is
the study population of the response variables.
The 𝑞 (𝑋1\(\cap\)𝑋2), which represents the interaction between the
factors (𝑋1, 𝑋2 … ), can be calculated by the interaction
detector. The interaction effect can be divided into 5 categories,
including: independent, nonlinear weaken (NW), nonlinear enhancement
(NE), single-factor nonlinear weaken
(SNW), double-factor enhancement
(DE), and independent (I) (Yan et al., 2021). Additionally, the model
requires category variables. Here, K-means method is used to convert the
response and explanatory variables into
category variables. Development
of the GeoD models was based on the R-software, “GD 1.10” package.