2.5 Statistical analysis
Descriptive analysis Median, interquartile range (IQR),
percentile and mean ± standard deviation was used to characterize the
air pollutants. Spearman’s correlation analysis was used to detect the
magnitude of correlation between each environmental factor. Independent
samples t-test or Mann-Whitney U rank sum test, Chi-square test or
Fisher exact test were used to compare the differences in
characteristics between the eczema group and the non-eczema group.
Generalized linear models (GLM) It is a direct generalization
of the common linear models, and logistic regression analysis was
implemented by the link function. The binary logistic model was used to
estimate the effects of pollutant exposure during pregnancy on
one-year-old eczema, two-year-old eczema, and cumulative eczema; and a
multiple logistic model was used to estimate the effect of pollutant
exposure during pregnancy and two years postnatally on no eczema, only
eczema at age 1, only eczema at age 2, and persistent eczema (no eczema
group: OR =1), the process was implemented through the ‘nnet’
package in the R software.
The distributed lag model (DLM) A sensitive period is a time
when the effects of exposure on development and disease risk are
stronger at one time than at others. In this study, the distributed lag
model was used to find the sensitive window period of air pollutants on
the onset of eczema. The vulnerable sensitive window period was
identified by dividing the exposure time by week, based on the
generalized linear model, and adding cross-sectional basis functions of
the study variables to assess the lag effect of exposure factors and the
relationship of exposure response. The process was implemented through
the ‘dlnm’ package in R software.
Weighted quantile sum (WQS) model To evaluate the joint effect
of exposure to air pollutants on the onset of eczema in children over
different time periods(Garcia-Serna et al., 2022). Inclusion of all
pollutants in the model and analysis of their positive association with
childhood eczema, the WQS index (a weighted linear index) is obtained,
which is considered as an overall mixture effect, while weight of each
pollutant indicated how much a certain pollutant contributed to the WQS
index, and the values of the weights range from 0 to 1. When
constructing the model, the model parameter “q” was set to 4, which
indicates that the effect is obtained after each quantile increase, the
number of bootstrap (b) samples used in the parameter estimation of the
model was set to 1000. The process was implemented through the ‘gWQS’
package in the R software.
Principal component analysis (PCA) PCA is a multivariate
statistical method that classifies multiple air pollutants with
correlations into a set of uncorrelated variables, called principal
components (PCs), representing the most important characteristics of the
raw data, which are arranged in descending order of variance. The
process was implemented through the ‘factoextra’ package in R software.
Sensitivity analysis : To verify the robustness of the WQS
model, firstly, the model parameter “q” was changed from 4 to 10,
representing the joint effect of exposure to air pollutants on childhood
eczema for each 10-percentile increase. Secondly, the number of
bootstrap (b) samples used in the parameter estimation of the model were
set to 100, 1000, 2000 and 3000 respectively. If there was no
significant difference in the results of sensitivity analysis, it
indicated the robustness of the model.
Variance Inflation Factor (VIF) was used to diagnose the covariate
covariates in each multifactorial model, and statistical analysis was
performed in this study with P <0.05 as the test level,
and all statistical analyses were performed in R 4.1.2.