Materials and methods
Sampling and DNA preparation
Fieldwork was carried out in the Lisbon Peninsula, where we searched for
water bodies containing marbled newts. We specifically extended our
search to include the Serra de Sintra mountains in the southwest of the
peninsula, because of the observed presence of T. marmoratus at
higher altitudes than T. pygmaeus in the northeastern section of
the species’ hybrid zone (Arntzen & Espregueira Themudo, 2008).
Twenty-five populations were found in an area spanning over 2000
km2, ranging from the Tejo River in the east and the
south, the Atlantic Ocean in the south and the west, up to the city of
Caldas da Rainha in the north, where our survey marginally overlapped
with the area investigated by Espregueira Themudo et al. (2007).
Adult and larval newts were captured by dip netting or with funnel
traps. To reduce sampling bias e.g. towards siblings from particular
breeding pairs (Goldberg & Waits, 2010) we made an effort to include
all accessible sections of the water bodies. Tail tip tissue samples
were collected and stored in 96% ethanol. The sampling was complemented
with material from seven localities obtained earlier (Espregueira
Themudo & Arntzen, 2007). DNA extraction of tissue samples followed the
KingfisherTM (Thermo Scientific) and DNeasy extraction
kit (Qiagen, Valencia, CA, USA) standard protocols.
SNP marker design
Species diagnostic SNPs were identified based on transcriptome data for
one adult male T. marmoratus from San Pedro da Cova (coordinates
41.157 N, 8.496 W) and one adult male T. pygmaeus from Umbria,
Serra de Monchique (coordinates 37.335 N, 8.506 W) that were sequenced
at ZF-screens, Leiden on the Illumina HiSeq 2000 platform. The
transcriptome libraries are available from Wielstra, McCartney-Melstad,
Arntzen, Butlin, and Shaffer (2019), through the NCBI SRA at BioProject
PRJNA498336
(https://www.ncbi.nlm.nih.gov/bioproject/PRJNA498336).
Data filtering was carried out with Trimmomatic v0.36 (Bolger, Lohse, &
Usadel, 2014) and the de novo transcriptome assembly with Trinity
v2.5.1 (Grabherr et al ., 2011). SNP marker design followed the
MIPs pipeline that encompasses advantages for targeted resequencing,
including high specificity, a high level of multiplexing and no
ascertainment bias (Hardenbol et al., 2003; Niedzicka, Fijarczyk,
Dudek, Stuglik, & Babik, 2016). The SNP design process followed van
Riemsdijk, Butlin, Wielstra, and Arntzen (2019); for details, see
Supplemental Information I. The Xenopus tropicalis (Gray, 1864)
genome obtained from Biomart ENSEMBL (genome version JGI4.2) was
selected as a reference. Both marbled newts’ transcriptomes were blasted
onto gene models of X. tropicalis in order to extract
unambiguously mapping exon sequences for SNP discovery. SNP
identification was performed in Mesquite v.3.40 after performing an
alignment with Muscle v.3.8.31.
SNP detection and validation
SNP genotyping took place at the Institute of Biology Leiden / Naturalis
SNP line facility using the Kompetitive Allele-Specific PCR (KASP)
genotyping system (LGC genomics, UK). KASP is a fluorescence-based
method determining the bi-allelic score of SNPs in uniplex assays. KASP
is based on two allele-specific primers with a final base complementary
to one of the two potential SNPs and unique tail sequence (Semagn, Babu,
Hearne, & Olsen, 2014). The KASP master mix contains different
fluorescent-labelled primers that become activated during PCR cycles,
with the fluorescent signal increasing as more fluorescent primers are
incorporated during the thermocycling of the PCR reaction. Primers were
designed using the Kraken software and ordered from Integrated DNA
Technologies (Wood & Salzberg, 2014).
For SNP validation, we used 120 T. pygmaeus and 118 T.
marmoratus from 43 populations across the Iberian Peninsula, located
outside the documented hybrid zone of these species (Arntzen, 2018;
Figure 1 and Table S.1). The validation assay of 192 SNPs resulted in
147 markers being polymorphic, of which 81 were deemed species
diagnostic for the reference samples. For the Lisbon Peninsula 354
individuals from 32 populations (25 new and 7 previously studied
populations) were KASP genotyped for the 60 most promising nuclear SNPs.
A further four primer sets were developed from the sequence information
provided by Espregueira Themudo, Nieman, and Arntzen (2012), for the
nuclear genes ß-Fibrinogen intron 7 (BF), Calreticulin intron C (CC) and
Platelet-derived growth factor receptor α intron 11 (PDG) and for the
mitochondrial gene NADH dehydrogenase subunit 4 (ND4).
Population genetics
Hardy-Weinberg equilibrium and genotypic disequilibrium among pairs of
nuclear loci were tested with the R package GENEPOP v1.0.5, under the
Benjamini-Hochberg correction (Benjamini & Hochberg, 1995; Rousset,
2008). Gene flow between genetically distinct populations produces
admixture linkage disequilibrium among those loci that have different
allele frequencies in the founding populations (Pfaff et al. ,
2001). Admixture linkage disequilibrium was estimated following Barton
& Gale (1993) and was based on 1,000 bootstrap replicates of the
original dataset, using a script from van Riemsdijk, Butlin, Wielstra,
and Arntzen (2019). The STRING v.10.5 protein-protein interaction
network database (Szklarczyk et al ., 2015) was consulted to
examine the functional linkage among the annotated nuclear markers, with
reference to the X. silurana genome.
The SNP data were analysed with Structure software under the ‘admixture
model’, given that neighbouring populations can interbreed (Pritchard,
Stephens, & Donnelly, 2000; Falush, Stephens, & Pritchard, 2003). As
the genotype pool could only be biallelic corresponding to the species
to be analysed, the number of potentially differentiated gene pools was
predetermined as K =2. The program was run for 100,000 generations
after 100,000 generations of burn-in. Individuals were classified as
pure T. marmoratus (Q < 0.01), or pure T.
pygmaeus (Q > 0.99) or admixed (0.01 <Q < 0.99).
Environmental modelling
Species distribution models were constructed by the comparison of
presence-only data for both species under reference to contemporary
climate conditions. The biological data employed were 108 T.
marmoratus and T. pygmaeus records that were supported by
molecular species identification (Arntzen, 2018; present paper). The
records for three genetically admixed populations from Portugal and
Spain and three T. marmoratus populations from France were
excluded. Explanatory variables were the 19 climate parameters of
WorldClim 2.0 (bio01-bio19; see Fick and Hijmans, 2017). To identify and
subsequently reduce collinearity, we constructed a half-matrix of
pairwise Spearman correlation coefficients (rs). This
matrix was subjected to clustering with UPGMA in PRIMER-e software
(Clarke & Gorley, 2006). Parameters bio14 and bio15 were deemed
unavailable for selection because of their regular high discrepancy
between data sets (Varela, Lima-Ribeiro, & Terribile, 2015). Under the
criterion of partial independence atr s<0.7, the parameters retained were
bio01, bio02, bio03, bio05, bio06, bio08, bio12 and bio17 (Supplementary
Information II). Logistic regression analyses were performed with SPSS
20 (IBM SPSS, 2016) with ‘species’ as the dependent variable. Parameter
selection was in the forward stepwise mode under the criteria of
Pin=0.05 and Pout=0.10 under the
likelihood ratio criterion. Model fit was assessed by the area under the
curve (AUC) statistic. The resulting two-species distribution model was
then applied to climate reconstructions for the mid-Holocene and the
Last Glacial Maximum (WorldClim version 1.4; Hijmans, Cameron, Parra,
Jones, & Jarvis, 2005). In the absence of firm guidance of which
climate reconstruction would be most appropriate to apply (Guevara,
Morrone, & León‐Paniagua, 2019), distribution models were derived for
all of them (i.e. nine models for the mid-Holocene and three models for
the Last Glacial Maximum). Distribution models were visualized with
ILWIS (ILWIS, 2009).