Methods
We searched GenBank to obtain sequence data for RHDV1 (Table S1) before
different virus strains were known to occur in Australia (Hall et al.,
2015, Mahar et al., 2017). In fact, up until 2014, descendants of the
originally released RHDV strain were the only virulent RHDV strains
known to circulate in Australia (Kovaliski et al., 2014). It is
important to highlight that, in all RHDV molecular studies conducted in
Australia, sampling was opportunistic, i.e. sampling occurred from
rabbit carcasses opportunistically encountered after a disease outbreak.
Hence, while we had a substantial temporal coverage, we could only
obtain data for one or two individuals within each site and year.
We retained only sequences where, in the original publication, the
number of screened samples were stated or where it was confirmed with
the author(s) that no bias was introduced towards highly variable
sequences. Time of collection of the samples was included in the
analysis as fractional years before present. We aligned the sequences in
Geneious (Kearse et al., 2012) and, as most sequences found were from
the hypervariable gene that encodes for the capsid protein VP60, we
trimmed the alignment to include only data for this gene, leaving 1740
nt for analysis.
Using the collated dataset, we conducted coalescent and newly developed
phylodynamic analyses and compared results with known epidemiological
data collected in the field, thereby providing an indication of the
accuracy of these analytical methods implemented in a natural context.
Specifically, we initially used the Bayesian sampled ancestor
birth-death skyline tree prior (SABDSKY, Gavryushkina et al., 2014). We
considered this the most flexible model and used it to test the fit of
the data to a strict molecular clock, or alternatively a lognormal or
exponential relaxed molecular clock (Drummond et al., 2006) by using the
nested sampling approach (Russel et al., 2018) and comparing the Bayes
Factor between models (Baele et al., 2012). For each model, we gradually
increased the subchain length until we obtained two approximately equal
maximum likelihood estimates, between consecutive analyses. We then
conducted a second run with the longest subchain to ensure consistent
results. We used a very tight lognormal prior around the origin
parameter (log mean=2.95 and log stdev=0.01), with a hard lower and
upper bound of 18.47 and 19.8 years before the most recent sample, as
Australia was RHDV free before its release and we considered it unlikely
for outbreaks to go unnoticed for a period longer than a few months. We
used a lognormal prior also for R e (log mean=0.5
and log stdev=1) and the total removal rate δ (Gavryushkina et al.,
2014) (log mean=2 and log stdev=2). We used a uniform distribution
bounded between 0 and 1 for the sampling proportion parameter, but fixed
this parameter to zero prior to 1995 because no samples were collected
prior to this date. We also fixed the removal probability parameter to
zero because we sampled very few individuals across the many that had
been infected, and even though in some instances we removed the whole
carcass at sampling (therefore reducing the probability of transmitting
the disease), it is likely that they already had opportunities to infect
other rabbits, so we considered appropriate to make the assumption that
sampling did not affect RHDV1 outbreaks. We initially modelledR e and δ with a piecewise constant function
through time, allowing different values for these parameters for each
five year “epoch” (i.e. a total of four time intervals after 1995). We
considered this sufficient to detect changes in these parameters and to
specifically test the hypothesis that there had been a change after
2005. However, on inspection of preliminary results, we noted an inverse
correlation between R e and δ. Therefore, for all
analyses we used a common δ across all epochs while lettingR e to vary. Furthermore, to better reflect the
sampling times, we fixed the sampling proportion parameter to zero in
the years where we did not have any samples as an alternative approach.
In all analyses, we used bModelTest (Bouckaert and Drummond, 2017) to
take into account the uncertainty of the substitution model, and a gamma
prior (shape=1, scale=0.001) for the clock rate, which encompasses the
expected range for an RNA virus (Biek et al., 2015). After defining the
molecular clock, we repeated the analysis with a Birth-Death skyline
(BDSKY, Stadler et al., 2013), a Bayesian skyline (BSP, Drummond et al.,
2005) and a time-aware Gaussian Markov random field Bayesian skyride
coalescent tree priors (Skyride, Minin et al., 2008). We allowed the BSP
analysis to consider six population size changes, and applied a
lognormal population size prior (with precision equal to 1) to
reconstruct changes over time of the virus population size. The mean of
the lognormal population size prior for each epoch is the estimated
population size of the previous epoch, with a uniform prior (between 0
and 3.8 X 105) for the oldest epoch. For the few
sequences for which the exact date of collection was unknown (accession
number: GU373617, GU373618, EU650679, EU650680), we put a lognormal
prior on the tip date to match the possible range and reflect this
uncertainty (the season of collection for these sequences was known,
Table S1). It should be noted that the tip dates sampler operator, which
is commonly used for this purpose, cannot be used in the SABDSKY model
(Gavryushkina pers. comm.), as it is incompatible with the extended
state space implied by that model. Instead, we replaced this with the
purposely designed ‘Sampled Node Date Random Walker’ operator developed
and described by Gavryushkina et al. (2014). Lastly, we attempted to
take into account geographical compartmentalisation using the multitype
Birth-Death model (aka: Birth-Death with Migration Model, BDMM) (Kühnert
et al., 2016). For this analysis we removed the (few) sequences from
Western Australia and Tasmania, and limited the analysis to two demes:
central-south Australia and east Australia (all priors were kept as
above). All analyses were conducted in BEAST2 (Bouckaert et al., 2014)
except the skyride analysis, which was run in BEAST1 (Drummond and
Rambaut, 2007) with a GTR + Г+ I. Tracer (Rambaut and Drummond, 2007)
was used to ensure convergence of the two MCMC that were run for each
model (that is, that independent analyses sampled the same parameter
space towards the end of the MCMC), and to generate the data for the BSP
and the skyride plot. The two runs were then combined for final
analysis. Plots for the BDSKY and SABDSKY analyses were generated with
the R package bdskytools
(https://github.com/laduplessis/bdskytools). All BEAST input files
are available from the corresponding author.