Key points:
  1. Machine learning application in neurosurgery can provide additional help to physicians and patients, optimizing risk-adjusted presurgical counselling, planning of tailored interventional therapies and integrating quantitative risk assessment in both legal and quality control perspectives.
  2. Pituitary surgery has reached highly effective optimization and complications occurrence is rare; however, CSF leakages still predispose to life threatening complications such as meningitis and tension pneumocephalus.
  3. A random forest classifier predicted best the occurrence or not of intraoperative CSF leakage, outperforming other machine learning architectures like SVM classifier, artificial neural network, logistic multivariate regression and naïve-Bayes classifier.
  4. Patients flagged as at high risk of CSF leakage according to machine learning classifier might receive additional attention in identifying even occult intraoperative leakages and sellar floor repair to prevent post-operative life threatening complications.
  5. The accurate prediction of intraoperative CSF poses an advancement to the state of the art of current surgical practice.
Introduction:
Pituitary Adenomas (PAs) account for 16% of all newly diagnosed primary CNS tumors, the third most common central nervous system tumors after meningiomas and gliomas. PAs are more frequent in women under the age of 50, while in later life their incidence becomes higher in males(1).
Surgical resection is usually the first-line treatment for large or functioning tumors except prolactinomas. The modern surgical approach includes two main TS techniques: microscopic and endoscopic approaches. The literature does not uniformly agree on the superiority of one technique over the other but there is an incontrovertible and consolidated tendency to prefer the endoscopic approach, being considered less invasive and more effective(2).
Despite continuous advances, three major complications are still associated with TS endoscopic surgery and are represented by hypopituitarism, diabetes insipidus, and CSF rhinorrhea(3). In a national study(4), the incidence of the latter was reported between 1.5% and 4.2%. Given the high risk of postoperative meningitis, it has to be considered a life-threatening condition.
The population affected by PAs results made up of multimorbid young adults or frail elderly patients – exposed to higher risk of perioperative complications – who would benefit most from a tailored multimodal treatment and effective preoperative risk assessment.
Since machine learning (ML) has already demonstrated its reliability in improving neurosurgical care, particularly by increasing the efficiency and precision of preoperative planning and surgical outcome prediction(5), the purpose of the current study is to investigate whether a comprehensive supervised ML model trained and internally validated on clinical, radiological and endocrinological preoperative data can predict the risk of intraoperative CSF leakage.
Materials and Methods
Participants:
We retrospectively reviewed a cohort of patients consecutively subjected to endonasal endoscopic transsphenoidal surgery (E-TNS) for pituitary adenomas (PA), performed by the same skull base team in the interval between January 2014 and January 2020. All patients were treated by a dedicated staff according to the same recommendations and protocols. PAs treated via transcranial approaches were excluded.
We included patients with at least 95% data collection (demographics, endocrinological, historical, radiological and intraoperative occurrence of CSF leakage), reliable information on extension of resection and post-operative CSF leakage occurrence, KI67 status at pathology report, early postoperative (within 72 hours) and 3-months postoperative volumetric contrast-enhanced MRI. Mild CSF leakages were repaired with synthetic dural patches and fibrin sealants combined with mucoperiosteal flap, while moderate to severe CSF leakages required multilayer autologous fascia lata grafts, often in association with tissue sealants and mucoperiosteal flap. All patients received nasal packing for the first 24 hours and those suffering from intraoperative CSF leak had subsequent CSF drainage through seriated lumbar punctures or external spinal shunt according to leak severity.
Variables of interest and outcome:
All medical charts were reviewed by two investigators (L.T. and G.F.) and cross-matching of all tabular data was performed afterwards. Radiological (x-axis: coronal ; y-axis: craniocaudal ; z axis: anteroposterior diameters ) and resection measurements (volumetric EOR assessment on axial post contrast T1 scans) were performed by senior neuroradiologists. Osteo-dural invasiveness was stated as preoperative radiological bone invasion at T2-weighted, contrast-enhanced-T1 and CT scans(6). “R ratio” was defined as the ratio between horizontal tumor diameter and inter-carotid distance at the horizontal intracavernous segment (ICD). All PAs were graded according to Knosp(7) and Hardy(8) classifications by the senior author. All postoperative information (KI67, extension of resection and postoperative CSF leak occurrence) were discarded after exploratory analysis and not included in model building.
Statistical analysis
Quantitative data were tested for normality. Normal variables were reported as mean ± standard deviation (SD) and compared by Student t-test, skewed variables were reported as median and interquartile range (IQR) and tested by Wilcoxon test. Categorical variables were reported as absolute “counts (percentages)” and intergroup comparison was computed using Fisher’s exact test or Chi-square test. Missing data (<5%) were relocated using predictive mean matching. For all traditional hypotheses, p values < 0.05 were considered statistically significant.
A multivariate logistic regression model was fitted for independent predictors of intraoperative CSF leak. Sex, age, PA secretion status, prior EE-TNS surgery, R ratio, ICD, Knosp grades, Hardy grades, volume, diameters in 3 axes and osteo-dural invasiveness were extracted as independent variables in the dataset, which was split preserving similar class distribution into a training (70%) and a hold-out-test sets (30%). Features selection was computed with the BORUTA algorithm(9) (parameter “importance” threshold: 70%) based on a random forest classifier model.
Class imbalance was corrected oversampling the minority class of the training set (SMOTE-NC: Synthetic minority over-sampling technique for nominal and continuous features(10)).
Based on an exploratory accuracy evaluation on the training set performed with a customized pipeline for model comparison (TPOT(11)), we selected 5 supervised-machine learning models across different subgroups of ML methods (Bayesian models, generalized linear models, marginal classifiers, decision trees generators and deep learning architectures) showing F1-score > 0.60 on training set without parameters tuning: naïve-bayes classifier (NB), multivariate logit regression (MLS), support-vector machine classifier (SVM-C), random forest classifier (RF) and artificial neural network (ANN) were included. We trained 5 optimized models on the training set, tuned hyperparameters with 10-fold cross validation using GridSearch(12,13) and measured their performance on the hold-out test set (data never approached by the model before) for final validation. Extensive performance metrics were analyzed and models were ranked based on overall area under the curve (AUC). Interval of confidence was reported for AUC imputed on both training and testing set. “Dropout” and “Early stop” methods were implemented to minimize overfitting in the ANN11Additional materials on the rationale of supervised-machine learning models implemented in the current study and structure of the ANN are provided in the Supplementary File). .
All statistical analyses were computed in Python language (v. 3.7.5.),with Anaconda graphical user interface (GUI; http://www.anaconda.com) implementing scikit learn, pandas, numpy, seaborn, statsmodel libraries and built-in-Keras TensorFlow framework(14) (Google Brain Team, Google LLC; http://tensorflow.org). The integrated pipeline of the current study is summarized in Figure 1.
Results
Population:
238 patients consecutively operated between 2014 and 2020 were included. Demographics characteristics of the population are reported inTable 1 . Intraoperative CSF leakage occurred in 54 patients (22,6 %). Postoperative CSF leak occurred in 5 patients (2,1 %). All of them had experienced intraoperative CSF fistula and no cases of meningitis or tension pneumocephalus occurred. PAs with Ki67>3% at pathology examination were more represented in patients who suffered from intraoperative CSF leakage (48,1%vs 10,3%, p = 0.001). At the 3-month radiological follow-up, GTR was achieved in 120 patients (50,2 %).
Classical inferential analysis of CSF leak predictors(Table 2 ):
Independent predictors of intraoperative CSF leak in the multivariate logistic regression analysis were: non secreting status (OR:9,7, p=0.00001), osteo-dural invasiveness (OR:0,34, p=0.01), higher age (OR:1,03, p=0.04) and reduced ICD (OR:0,88, p=0.0087). Overall, the multivariate logistic regression model showed a moderate predicting power (AUC: 0,60. Sensitivity: 55%. Specificity: 63%).
Supervised machine learning predictive models
The Boruta algorithm selected the most relevant features according to model performance metrics: non secreting status, age, x-axis, y-axis, z-axis, ICD and R ratio. These were included in any further analysis. The dimensionality reduction of the implemented data corresponded to shortened training time, simplification of imputations and improvement in classification performance(15). Optimized models were ranked according to AUC in Table 3. Selected features importance for the optimal model were reported for interpretability (Figure 2 ; a correlation plot is provided for comparison)
The random forest classifier demonstrated to be the most accurate ML model showing high training parameters (AUC: 0.88. Accuracy: 87%, Sensitivity: 95%. Specificity: 80%. F1-score: 0.88) and overall high discriminative capacity in predicting intraoperative CSF leak occurrence in the hold-out test set for internal validation (AUC:0.84. Accuracy: 84%. Sensitivity: 87%. Specificity: 82%. PPV: 69%. NPV: 93%. F-1 score: 0.87).