Key points:
- Machine learning application in neurosurgery can provide additional
help to physicians and patients, optimizing risk-adjusted presurgical
counselling, planning of tailored interventional therapies and
integrating quantitative risk assessment in both legal and quality
control perspectives.
- Pituitary surgery has reached highly effective optimization and
complications occurrence is rare; however, CSF leakages still
predispose to life threatening complications such as meningitis and
tension pneumocephalus.
- A random forest classifier predicted best the occurrence or not of
intraoperative CSF leakage, outperforming other machine learning
architectures like SVM classifier, artificial neural network, logistic
multivariate regression and naïve-Bayes classifier.
- Patients flagged as at high risk of CSF leakage according to machine
learning classifier might receive additional attention in identifying
even occult intraoperative leakages and sellar floor repair to prevent
post-operative life threatening complications.
- The accurate prediction of intraoperative CSF poses an advancement to
the state of the art of current surgical practice.
Introduction:
Pituitary Adenomas (PAs) account for 16% of all newly diagnosed primary
CNS tumors, the third most common central nervous system tumors after
meningiomas and gliomas. PAs are more frequent in women under the age of
50, while in later life their incidence becomes higher in males(1).
Surgical resection is usually the first-line treatment for large or
functioning tumors except prolactinomas. The modern surgical approach
includes two main TS techniques: microscopic and endoscopic approaches.
The literature does not uniformly agree on the superiority of one
technique over the other but there is an incontrovertible and
consolidated tendency to prefer the endoscopic approach, being
considered less invasive and more effective(2).
Despite continuous advances, three major complications are still
associated with TS endoscopic surgery and are represented by
hypopituitarism, diabetes insipidus, and CSF rhinorrhea(3). In a
national study(4), the incidence of the latter was reported between
1.5% and 4.2%. Given the high risk of postoperative meningitis, it has
to be considered a life-threatening condition.
The population affected by PAs results made up of multimorbid young
adults or frail elderly patients – exposed to higher risk of
perioperative complications – who would benefit most from a tailored
multimodal treatment and effective preoperative risk assessment.
Since machine learning (ML) has already demonstrated its reliability in
improving neurosurgical care, particularly by increasing the efficiency
and precision of preoperative planning and surgical outcome
prediction(5), the purpose of the current study is to investigate
whether a comprehensive supervised ML model trained and internally
validated on clinical, radiological and endocrinological preoperative
data can predict the risk of intraoperative CSF leakage.
Materials and Methods
Participants:
We retrospectively reviewed a cohort of patients consecutively subjected
to endonasal endoscopic transsphenoidal surgery (E-TNS) for pituitary
adenomas (PA), performed by the same skull base team in the interval
between January 2014 and January 2020. All patients were treated by a
dedicated staff according to the same recommendations and protocols. PAs
treated via transcranial approaches were excluded.
We included patients with at least 95% data collection (demographics,
endocrinological, historical, radiological and intraoperative occurrence
of CSF leakage), reliable information on extension of resection and
post-operative CSF leakage occurrence, KI67 status at pathology report,
early postoperative (within 72 hours) and 3-months postoperative
volumetric contrast-enhanced MRI. Mild CSF leakages were repaired with
synthetic dural patches and fibrin sealants combined with mucoperiosteal
flap, while moderate to severe CSF leakages required multilayer
autologous fascia lata grafts, often in association with tissue sealants
and mucoperiosteal flap. All patients received nasal packing
for the first 24 hours and those suffering from intraoperative CSF leak
had subsequent CSF drainage through seriated lumbar punctures or
external spinal shunt according to leak severity.
Variables of interest and outcome:
All medical charts were reviewed by two investigators (L.T. and G.F.)
and cross-matching of all tabular data was performed afterwards.
Radiological (x-axis: coronal ; y-axis: craniocaudal ; z axis:
anteroposterior diameters ) and resection measurements (volumetric EOR
assessment on axial post contrast T1 scans) were performed by senior
neuroradiologists. Osteo-dural invasiveness was stated as preoperative
radiological bone invasion at T2-weighted, contrast-enhanced-T1 and CT
scans(6). “R ratio” was defined as the ratio between horizontal tumor
diameter and inter-carotid distance at the horizontal intracavernous
segment (ICD). All PAs were graded according to Knosp(7) and Hardy(8)
classifications by the senior author. All postoperative information
(KI67, extension of resection and postoperative CSF leak occurrence)
were discarded after exploratory analysis and not included in model
building.
Statistical analysis
Quantitative data were tested for normality. Normal variables were
reported as mean ± standard deviation (SD) and compared by Student
t-test, skewed variables were reported as median and interquartile range
(IQR) and tested by Wilcoxon test. Categorical variables were reported
as absolute “counts (percentages)” and intergroup comparison was
computed using Fisher’s exact test or Chi-square test. Missing data
(<5%) were relocated using predictive mean matching. For all
traditional hypotheses, p values < 0.05 were considered
statistically significant.
A multivariate logistic regression model was fitted for independent
predictors of intraoperative CSF leak. Sex, age, PA secretion status,
prior EE-TNS surgery, R ratio, ICD, Knosp grades, Hardy grades, volume,
diameters in 3 axes and osteo-dural invasiveness were extracted as
independent variables in the dataset, which was split preserving similar
class distribution into a training (70%) and a hold-out-test sets
(30%). Features selection was computed with the BORUTA algorithm(9)
(parameter “importance” threshold: 70%) based on a random forest
classifier model.
Class imbalance was corrected oversampling the minority class of the
training set (SMOTE-NC: Synthetic minority over-sampling technique for
nominal and continuous features(10)).
Based on an exploratory accuracy evaluation on the training set
performed with a customized pipeline for model comparison (TPOT(11)), we
selected 5 supervised-machine learning models across different subgroups
of ML methods (Bayesian models, generalized linear models, marginal
classifiers, decision trees generators and deep learning architectures)
showing F1-score > 0.60 on training set without parameters
tuning: naïve-bayes classifier (NB), multivariate logit regression
(MLS), support-vector machine classifier (SVM-C), random forest
classifier (RF) and artificial neural network (ANN) were included. We
trained 5 optimized models on the training set, tuned hyperparameters
with 10-fold cross validation using GridSearch(12,13) and measured their
performance on the hold-out test set (data never approached by the model
before) for final validation. Extensive performance metrics were
analyzed and models were ranked based on overall area under the curve
(AUC). Interval of confidence was reported for AUC imputed on both
training and testing set. “Dropout” and “Early stop” methods were
implemented to minimize overfitting in the ANN11Additional
materials on the rationale of supervised-machine learning models
implemented in the current study and structure of the ANN are provided
in the Supplementary File). .
All statistical analyses were computed in Python language (v.
3.7.5.),with Anaconda graphical user interface (GUI;
http://www.anaconda.com) implementing scikit learn, pandas, numpy,
seaborn, statsmodel libraries and built-in-Keras TensorFlow
framework(14) (Google Brain Team, Google LLC;
http://tensorflow.org). The integrated pipeline of the current
study is summarized in Figure 1.
Results
Population:
238 patients consecutively operated between 2014 and 2020 were included.
Demographics characteristics of the population are reported inTable 1 . Intraoperative CSF leakage occurred in 54
patients (22,6 %). Postoperative CSF leak occurred in 5 patients (2,1
%). All of them had experienced intraoperative CSF fistula and no cases
of meningitis or tension pneumocephalus occurred. PAs with
Ki67>3% at pathology examination were more represented in
patients who suffered from intraoperative CSF leakage (48,1%vs 10,3%,
p = 0.001). At the 3-month radiological follow-up, GTR was achieved in
120 patients (50,2 %).
Classical inferential analysis of CSF leak predictors(Table 2 ):
Independent predictors of intraoperative CSF leak in the multivariate
logistic regression analysis were: non secreting status (OR:9,7,
p=0.00001), osteo-dural invasiveness (OR:0,34, p=0.01), higher age
(OR:1,03, p=0.04) and reduced ICD (OR:0,88, p=0.0087). Overall, the
multivariate logistic regression model showed a moderate predicting
power (AUC: 0,60. Sensitivity: 55%. Specificity: 63%).
Supervised machine learning predictive models
The Boruta algorithm selected the most relevant features according to
model performance metrics: non secreting status, age, x-axis, y-axis,
z-axis, ICD and R ratio. These were included in any further analysis.
The dimensionality reduction of the implemented data corresponded to
shortened training time, simplification of imputations and improvement
in classification performance(15). Optimized models were ranked
according to AUC in Table 3. Selected features importance for
the optimal model were reported for interpretability (Figure 2 ;
a correlation plot is provided for comparison)
The random forest classifier demonstrated to be the most accurate ML
model showing high training parameters (AUC: 0.88. Accuracy: 87%,
Sensitivity: 95%. Specificity: 80%. F1-score: 0.88) and overall high
discriminative capacity in predicting intraoperative CSF leak occurrence
in the hold-out test set for internal validation (AUC:0.84. Accuracy:
84%. Sensitivity: 87%. Specificity: 82%. PPV: 69%. NPV: 93%. F-1
score: 0.87).