Sequencing data
We selected species (or species complexes) that are widely distributed
across the Top End and Kimberley regions, which also had geographic
extensive genetic sampling with precise GPS coordinates, and which
represented different genera and families of lizards. These include
geckos (Gehyra and Heteronotia ; Gekkoninae), skinks
(Carlia ; Scincidae) and dragons (Diporiphora ; Agamidae).
All but Heteronotia (represented by the generalist H.
binoei ) include species with different habitat requirements and, from
prior multilocus sequencing, varying scales of phylogeographic structure
(Figure 1C; Supplementary Material S1), making them ideal for our study.
Based on prior phylogeographic analyses and (except for rare cases of
known mtDNA introgression) using mtDNA for lineage identification, we
selected a subset of 579 samples (135 Carlia , 147Diporiphora, 214 Gehyra and 83 Heteronotia ) for the
SNP screening, focusing on spatially unique individuals to maximize the
number and geographic spread of sampled localities across the known
range of each taxon (see Battey et al, 2020). We treat closely-related
and parapatrically-distributed lineages as phylogeographic units,
whether or not they have been recently revised taxonomically (Suppl.
Mat. S1).
Our SNP detection method, Diversity Array Technology (DArT™), uses
restriction-enzyme reduction sequencing on Next-Generation-Sequencing
platforms to identify SNPs within randomly distributed 75bp contigs
(Jaccoud, 2001), and has proven valuable for detecting admixture between
populations (Jane Melville et al., 2017; Unmack et al., 2017) and for
landscape genetic studies (De Fraga, Lima, Magnusson, Ferrão, & Stow,
2017; Rossetto et al., 2019). Details of the SNP genotyping can be
accessed in Georges et al. (2018) and Wells and Dale (2018). For samples
from each genus, and within the older Gehyra radiation, theaustralis, koira and nana clades separately (Figure S3),
the sequences were processed by proprietary DArT analytical pipelines to
map reads and call SNPs. First, sequences are quality filtered using
stringent selection criteria that compares the barcode region to the
rest of the sequence. Next, using the DART fast clustering algorithm
with a Hamming distance, sequences are aggregated into clusters. Then,
SNP markers are identified in each cluster that will calculate an index
of reproducibility for each locus. The resulting data contains the
presence/absence of restriction fragments per SNP (SilicoDArT) and the
final SNP calling with the position of a variant base related to the
restriction fragment. To ensure the quality of our data, we filtered
SNPs by repeatability across technical replicates (>98%),
call rate (<10% missing data), and removed singletons, using
the dartR package (Gruber, Unmack, Berry, & Georges, 2018) in
RStudio (RStudio Core Team, 2015).