DNA extraction and variant calling
We obtained tissue samples and phenotypic data of Costa Rican and
Panamanian populations from natural history museums and from wild-caught
individuals to study the genetic and morphological variation between
subspecies of S. corvina from allopatric sites and across contact
zones. Tissue samples from natural history collections were stored in
ethanol or frozen, while blood samples were collected in the field from
birds caught with mist nets, extracting blood from the brachial vein.
Procedures to work with wild animals in the field were reviewed and
approved by the University of Miami Institutional Animal Care and Use
Committee (IACUC, Permit number: 19-128-LF). We extracted genomic DNA
from 257 individuals (Table S1), using Qiagen’s DNeasy Blood and Tissue
Extraction Kit (Qiagen, Hilden, Germany). We followed the manufacturer’s
protocol with the optional addition of 4 ul of RNase to remove potential
RNA contaminants.
We then used a Genotype-by-Sequencing approach (GBS; Elshire et al.,
2011), digesting the DNA samples with the ApeKI restriction enzyme, and
sequencing short fragments across the genome. GBS library prep and
sequencing were carried out by the Bioinformatics Resource Center of the
University of Wisconsin-Madison on an Illumina NovaSeq 6000 2x150 Shared
Sequencing. We called SNP variants using TASSEL version 5.2.65 (Glaubitz
et al., 2014), following the GBSv2 SNP discovery and production
pipeline. We ran this pipeline with mostly default parameters but
increased the minimum quality score to 20 (default mnQS = 0). We aligned
our short-read sequences to the reference genome of the Tawny-bellied
Seedeater (Sporophila hypoxantha, GenBank: GCA_002167245.1;
Campagna et al., 2017) using BOWTIE2 (Langmead & Salzberg, 2012) and
the “very-sensitive” preset for high accuracy. Then, we filtered
genome-wide single-nucleotide polymorphisms (SNPs) using VCFTOOLS
version 0.1.16 (Danecek et al., 2011). Our filtered criteria consisted
of including biallelic SNPs with no indels, minimum read depth ≥ 3,
minimum allele frequency ≥ 0.05, a maximum proportion of missing data ≤
0.25, and SNPs that were ≥ 100 bp apart to reduce the effect of highly
linked markers. We further filtered the data file in TASSEL, keeping
SNPs with a maximum heterozygosity ≤ 0.8 to exclude potential paralogs,
and removed individuals with ≥40% missing data. The resulting data set
retained 255 individuals and 14,839 SNPs. For the demographic modeling
(see below), we used different criteria, including biallelic SNPs with
no indels, minimum read depth of ≥ 3, maximum proportion of missing data
of ≤ 0.15, and SNPs that were ≥ 1000 bp apart to obtain a complete site
frequency spectrum (not truncated due to minimum allele frequency
filtering) while keeping a manageable SNPs sample size, due to computer
processing. This second data set retained 8,437 SNPs from 10 randomly
selected genotypically pure individuals per subspecies (i.e., q> 0.95, see Results).