2 | METHODS
| Cancer registry and patient information
Construction of this single institution childhood cancer survivorship cohort was based on cancer registry data and integration of EHR data elements linked by the medical record number (MRN). As part of the accreditation by the American College of Surgeons Commission on Cancer, centers are required to report all newly diagnosed cases to the NCDB.33 Centers are also required to report cases to respective state registries regardless of their accreditation status. Inclusion criteria for the construction of this childhood cancer survivorship cohort (Figure 1) included patients ≤18 years of age with a diagnosis of a malignancy reported to the cancer registry. The cohort was limited to patients seen in the pediatric oncology (PHO) or neuro-oncology (PNO) clinics between January 1, 1994 and November 30, 2012 in order to ensure a seven year follow-up period for all survivors. Patients who died or had a documented relapse during the seven year follow-up window after date of diagnosis were excluded. In order to exclude referrals for refractory or relapsed cases that were not diagnosed and treated at this institution, only analytic cases were included. Analytic cases are defined by the Facility Oncology Registry Data Standards (FORDS) Manual as cases diagnosed at and/or received all or part of the first course of treatment at the reporting facility (Duke University Medical Center).
| Disease classification
Cancer diagnoses were grouped according to the International Classification of Childhood Cancer, third revision (ICCC-3),34 by using the International Classification of Diseases for Oncology, third revision (ICD-O-3), as reported in the cancer registry (Table 1). The ICD-O-3 codes were then used to further classify diagnoses and group patients into malignancy categories outlined by the BCCSS.6 Additionally, for brain tumor patients, ICD-O-3 topography for central nervous system (CNS) locations were used to mitigate misclassification based on primary pathologic diagnosis (i.e. intracranial mixed germ cell tumors).
| Risk stratification
The cancer registry captures the first course of treatment based on chart review by a certified cancer registrar in accordance with the FORDS Manual.35 Exposures are reported as dichotomous (Yes/No) for surgery, diagnostic biopsy, radiation, chemotherapy, hormonal therapy, immunotherapy, other, palliative, and transplant. Based on these exposures and the primary diagnosis classification, risk strata were constructed from the BCCSS system (Table 1).6
| Follow-up definitions
The institutional cancer registry provided the base cohort for all childhood cancer diagnoses. These registry data were merged, using MRN and a durable key unique patient identifier, with EHR data through the Duke Enterprise Data Unified Content Explorer (DEDUCE) to extract all visits in the PHO and PNO clinic encounters to identify eligible patients. To determine appropriate follow-up, all visits in the PBMT clinic and the Duke Cancer Institute were also extracted. Inadequate follow-up was defined as a survivor not being seen during the five to seven year window after the date of initial diagnosis.
| Spatial Variables
DEDUCE was also used to export the longitude and latitude coordinates of the home address, zip code, and the census block group Federal Information Processing Standards (FIPS) code for each survivor. Using ArcGIS 10.5.1 (ESRI, Redlands, CA), we calculated the Euclidean (straight line) distance from the address of each survivor to the nearest COG-affiliate site36 in North Carolina (NC), South Carolina (SC), and Virginia (VA). Analysis was limited to survivors whose coordinates were in NC, SC, and VA. Additionally, using spatial point-in-polygon joining operations, we identified the zip code-level Rural-Urban Commuting Area (RUCA) codes and the block group-level Area-Deprivation Index (ADI) for each survivor. RUCA is a categorical classification for rural vs urban areas that takes into account population density and distance to nearest urban centers. ADI is an indexed composite of seventeen variables related to social determinants of health from the United States Census and American Community Survey that captures socioeconomic disadvantage at the census block group level.37,38 A high ADI percentile, which represents greater disadvantage, has been shown to correlate with a number of adverse health outcomes.39,40
| Statistical analyses
Patients were grouped according to whether or not they were seen in a Duke Cancer Clinic five to seven years after initial date of diagnosis. Patient characteristics were compared between those that were seen in this window versus those that were not seen. Using the Cancer Registry, we utilized the last known date of contact to ensure that patients survived through the five to seven year window after their initial date of diagnosis before including them for analysis. Continuous variables are presented as medians (standard deviations), and differences were compared using the t-test. Categorical variables are presented as counts (proportions). Differences were compared using the χ2test. For all analyses, risk strata were categorized as a three level categorical variable (low, intermediate, and high risk).
Logistic regression was used to estimate the association between follow-up and risk stratification both in bivariate analyses and after adjusting for known covariates including ALL indicator, gender, age at diagnosis, race, and indicator of local state of residence. Local state of residence was defined as residing in NC, SC or VA to minimize potential confounding associations between risk strata, distance from medical center, and follow-up care. Because our primary variable of interest consisted of three levels (i.e. risk stratification), we utilized a multiple degree of freedom lack of fit test to compare a baseline model where risk stratification was excluded and separately, a model where it was included.
Subsequent models that included risk stratification and an indicator of local state of residence were used to determine if that association varied by a broad geographic indicator. Bonferroni adjustments were made for multiple comparisons. Only patients with complete data for all covariates were included for each analysis, and effective sample sizes are included for all tables and figures. In reviewing the correlation among all predictors in our models, we found no evidence that suggested multicollinearity might be an issue. All statistical analyses were conducted R version 3.6.1.
| RESULTS