Bioinformatics
Obtained paired-end reads were filtered and processed by the DADA2
pipeline (Callahan et al., 2016) in QIIME2-2019.01 (Bolyen et al.,
2019). Because the soil samples were amplified via nested PCR, sequences
representing <1% of the total number of reads per sample were
removed using the resulting BIOM table and a customized R script (Ben
Tekaya et al., 2018; Rodriguez et al., 2016). For nodule samples, the
samples that included numbers of reads within 99 percentiles of the
total number of all sample reads were used in the subsequent analyses
(n = 72). Then, we obtained amplicon sequence variants (ASVs).
The sequences were interrogated with BLAST+ 2.9.0 BLASTN (Camacho et
al., 2009) to remove non-Frankia sequences. These sequences were
then clustered via operational taxonomic unit (OTU) separation, using
the sequences of 18 uncultured Frankia strains as references
(i.e., OTU01-18 strains; accession number: LC482655-LC482672; Kagiya and
Utsumi 2020). OTU separation was performed using the CD-Hit program (Li
& Godzik, 2006) at 97.0 % similarity threshold. This threshold was
decided based on Põlme et al. (2014) and Kagiya and Utsumi (2020).
Phylogenetic trees (Fig. S2) were constructed using maximum likelihood
(ML; bootstrap analyses with 1000 replications) based on the Kimura
two-parameter evolutionary model (Kimura, 1980) with a discrete gamma
distribution, which was selected
by the evolutionary model selection procedure in MEGA 7.0.21 (Kumar et
al., 2016). ML phylogenetic trees were generated in MEGA 7.0.21 (the
detailed method of ML phylogeny was previously described by Kagiya et
al. 2020).