Identifying fibroin and sericin genes in S. ricini genome
Fib-H (BAQ55621.1) and p25 (LC001863.1, LC001864.1 and LC001865.1) ofS. ricini were already registered in Genbank, thus using those sequences as query, BLASTP search against 16,702 gene models of S. ricini was conducted with an e-value less than 1e-5 and ‘-seg no’ option. In cases of BLASTP result being ‘No hits found,’ TBLASTN search against nucleotide sequences of S. ricini genome was conducted with the same parameter. In this report, ‘-evalue 1e-5’ and ‘-seg no’ options were always added when BLASTP and TBLASTN search were conducted with silk proteins (Fib-H, Fib-L, p25, sericin) as query. In order to investigate the homolog of Fib-L is present or not in S. ricini genome, B. mori Fib-L (NP_001037488.1) was utilised as query for BLASTP and TBLASTN search. In addition, we performed TBLASTN search against A. yamamai genome using B. mori Fib-L sequence as query.
Tsubota et al . (2015) and Dong et al . (2015) reported that 5 and 4 sericin genes are expressed in anterior silk gland and middle silk gland, respectively (Table S7). The deduced amino acid sequences of putative sericin transcripts were submitted to the gene model set of S. ricini through BLASTP. Regarding LC001867 and LC001870, because the corresponding gene models were not found, TBLASTN was conducted to confirm whether both transcripts were present or not.
When we tried to comprehend the repertoire of silk protein encoding genes in D. plexippus and P. xylostella , TBLASTN search against the genome assemblies was conducted with B. mori Fib-H (NP_001106733.1), Fib-L, p25 (NP_001139413.1) and sericin-1, 2, 3 (AB112019.1, NP_001166287.1, NP_001108116.1) sequences as queries because any transcripts or amino acid sequences were not previously reported as Fib-H, Fib-L, p25 and sericin in P. xylostella andD. plexippus . Genome assemblies which were used for TBLASTN search was the ones used in BUSCO analysis (Table S5). As the transcripts of Fib-H , Fib-L and p25 of P. xuthus were already registered (see Table 3), those sequences were mapped to the P. xuthus genome sequence to confirm the presence. Regarding sericin genes in P. xuthus , no sequences were previously registered in Genbank, thus the same procedure as the case ofP. xylostella and D. plexippus , was taken. Phylogenetic analysis of sericin was conducted with seven S. ricini putativesericin genes, three B. mori sericin genes and fiveA. yamamai sericin genes (LC08587, LC08588, LC08589, LC08590 and LC08591; Zurovec et al., 2016). Muscle was used to generate alignments of protein sequences (Edgar, 2004). Aligned sequences were subjected to phylogenetic analysis by maximum likelihood and bootstrap methods with 1,000 replicates using MEGAX (Kumar, Stecher, Li, Knyaz, & Tamura, 2018).