2.4.1 Library preparation and data processing
For the discovery of single nucleotide polymorphisms (SNPs), multiplexed ISSR genotyping by sequencing (MIG-seq) was conducted following the procedure by Suyama & Matsuki (2015) with a slight modification: annealing temperature of the first PCR was changed from 48°C to 38°C. Both ends of fragments were obtained by paired-end sequencing (read 1 and 2), but only read 1 was used for the following analyses. Low-quality reads were removed by FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) using quality_filteroption under the setting of q = 30 and p = 40. To remove the reads derived from extremely short library entries, the sequence primer regions in the sequences were searched and the reads which had the primer sequence were removed by FASTX-Toolkit usingfastx_clipper option.
De novo assembly was performed using Stacks v. 2.53 (Catchen et al., 2013). Since our samples were gametophytes, they were expected to be haploid. However, some samples had both female and male markers or two haplotypes of nuclear marker cetn -int2 suggesting diploidy. Therefore, at first, we performed assembly with the following parameters assuming that all samples are diploid: minimum number of identical reads required to create a stack (m = 3), the nucleotide mismatches between loci within a single individual (M = 2), the mismatches between loci when building the catalogue (n = 1), and other parameters were set default. The SNP genotype for each individual was exported using the ‘populations ’ command; only the first SNP was extracted from each putative locus using the flags –write_single_snp . As we expected, samples having some heterozygous loci were found (all samples from p24, p27, and one samples from Ona). Then, excluding these samples, the second assembly was performed with the following parameters assuming that all samples were haploid: m = 3, M = 0, and n = 1. Furthermore, since calling stacks from the secondary reads (reads that are not distinguishable from sequencing error) produced heterozygosity within individuals, it was disabled using the flags -N (set to zero) and -H. Then, the SNP genotype for each individual was exported as in the diploid dataset.
Both diploid and haploid datasets were processed using PLINK v. 2.00 (Chang et al., 2015;www.cog-genomics.org/plink/2.0/). SNPs with a minor allele frequency < 0.03, loci with a missing individual rate > 0.7, and individuals with a missing locus rate > 0.7 were filtered out. The diploid dataset included 237 samples from 36 populations, 818 SNPs (loci), and the mean genotyping rate was 50.2%. The haploid dataset included 212 samples from 34 populations, 865 SNPs (loci), and the mean genotyping rate was 48.8%. Format of the output files of PLINK was converted using PGDSpider2 (Lischer & Excoffier, 2012) for subsequent analyses. All samples from population p18 were removed due to their low genotyping rate.