Demographical, physical activity, and gynaecological variables
Participants’ demographical, gynaecological, and physical activity variables have been described in detail previously.11Shortly, age was calculated from the date of birth to the date of answering to the prequestionnaire. BMI was calculated as body mass (kg) divided by height squared (m2). Level of education was self-reported with a structured question and participants were classified into two groups based on their answers: those with bachelor level or higher education and those with education lower than bachelor level. Work-related physical activity was assessed with a structured question and participants were classified into the following groups: mainly sedentary work, work that includes standing and walking, and heavy work that includes also lifting.
Physical activity at the age of 17 to 29 years was assessed with the question: “What kind of regular physical activity have you done at different stages of your life?”13 Participants were asked to specify their participation by selecting one or more of the following four options: no physical activity, regular independent leisure-time physical activity, regular competitive sport and related training, and regular other supervised physical activity in a sports club, etc. Current physical activity was evaluated with a self-reported questionnaire14 including four questions about the frequency, intensity and duration of leisure-time physical activity bouts as well as the average time spent in active commuting. Based on the answers, a metabolic equivalent of hours per day (MET-h/d) for current physical activity was calculated.
Participants were assigned to premenopausal, early and late perimenopausal, and postmenopausal groups based on the FSH concentrations and self-reported menstrual bleeding diaries using the slightly modified Stages of Reproductive Aging Workshop (STRAW+10) guidelines.15 Self-reported data on gestations, parity, and whether a participant had undergone hysterectomy were collected.
Missing data
The total number of missing data values for the analytical sample including 1 098 participants was 338 out of 29 646 (1.1%). The percentage of missing values varied from 0 to 10% between the variables (Table S1). The data was missing due to the invalid or missing measurements and unclear or incomplete questionnaire response. Thus, missing data were assumed to occur at random. Multiple imputation was used to create and analyze 50 multiply imputed data sets with 50 iterations for chained equations for each16. The model parameters were estimated separately for each data set. Multiple imputation and pooling of the model estimates were carried out in R17 using the standard settings of the “mice” package.16 For comparison, we also performed complete case analysis and there was no significant differences in the results.