Network link prediction
We applied the following network link prediction (NLP) algorithms:
The plug-and-play algorithm (Dallas et al. 2017) predicts missing links based on conditional probability estimation and was developed to infer the probabilities of unobserved links being undetected through sets of input parameters.
The Poisson N-mixture link prediction model (Fu et al. 2019) combines the Poisson N-mixture model used in ecological research with a low-rank collaborative filtering approach. Poisson N-mixture models are used in ecological research to account for imperfect detection in field observations (Royle 2004). Meanwhile, low-rank matrix completion–based collaborative filtering methods are a popular approach for NLP in social network studies. Missing entries in a data matrix are completed based on a low number of known entries (low rank matrix), e.g. to predict consumer preferences (Candes & Plan 2010).
We provided ecological, morphological, and phylogenetic input parameters to these models (Table 1). Both models do currently not account for phylogenetic uncertainty. Therefore, we included only the majority-rule consensus host and parasite BI phylogenies and the ward.d2dendrograms (Murtagh & Legendre 2014), one of the most widely used clustering algorithms (Murtagh & Legendre 2014). To avoid overfitting, we reduced the number of input variables per parameter through principal coordinate analyses (PCoA) of the distance matrices of each parameter resulting in 9 input parameters consisting of 24 variables (Table 1). Distance matrices were computed through the cophenetic function in R applied to dendrograms that were built for each parameter separately through the clustering methods also employed for the host niche dendrograms. We imputed missing data through a k-nearest neighbour algorithm (kNN) in the R package caret (Kuhn 2008, 2020).
We assess model performance through the Area Under the Receiver Operating Characteristic curve (AUROC) and 10-fold cross validation. Each time, the algorithms were trained on 80% of the interaction matrix to predict the remaining 20%. We implemented the models in Rv4.1.2 (R Core Team 2022) and MATLAB v9.11.0 (MathWorks, Natick, USA) using published codes (doi: 10.6084/m9.figshare.4965038; https://github.com/Hutchinson-Lab/Poisson-N-mixture). Following Dallaset al. (2017), we assessed variable importance of theplug-and-play algorithm by measuring the reduction in model performance resulting from 500-fold permutation of each parameter. For variable importances and link prediction, the algorithm was trained on the full dataset. These analyses were not performed for thePoisson N-mixture model due to lacking implementation.