Bioinformatics
Obtained paired-end reads were filtered and processed by the DADA2 pipeline (Callahan et al., 2016) in QIIME2-2019.01 (Bolyen et al., 2019). Because the soil samples were amplified via nested PCR, sequences representing <1% of the total number of reads per sample were removed using the resulting BIOM table and a customized R script (Ben Tekaya et al., 2018; Rodriguez et al., 2016). For nodule samples, the samples that included numbers of reads within 99 percentiles of the total number of all sample reads were used in the subsequent analyses (n = 72). Then, we obtained amplicon sequence variants (ASVs). The sequences were interrogated with BLAST+ 2.9.0 BLASTN (Camacho et al., 2009) to remove non-Frankia sequences. These sequences were then clustered via operational taxonomic unit (OTU) separation, using the sequences of 18 uncultured Frankia strains as references (i.e., OTU01-18 strains; accession number: LC482655-LC482672; Kagiya and Utsumi 2020). OTU separation was performed using the CD-Hit program (Li & Godzik, 2006) at 97.0 % similarity threshold. This threshold was decided based on Põlme et al. (2014) and Kagiya and Utsumi (2020).
Phylogenetic trees (Fig. S2) were constructed using maximum likelihood (ML; bootstrap analyses with 1000 replications) based on the Kimura two-parameter evolutionary model (Kimura, 1980) with a discrete gamma distribution, which was selected by the evolutionary model selection procedure in MEGA 7.0.21 (Kumar et al., 2016). ML phylogenetic trees were generated in MEGA 7.0.21 (the detailed method of ML phylogeny was previously described by Kagiya et al. 2020).