3.2 | Effects of using different in silico mate pairs on genome assembly of C. batrachus
The assemblies of C. batrachus generated using only paired-end libraries were unsatisfactory, the NGA50 only approximating 5.5 Kb and the number of complete BUSCOs (Benchmarking Universal Single-Copy Orthologs) 1,614 (Table 1). Both the original in silico method (mate pairs generated using one reference from the same genus) and the optimizedin silico method (conserved mate pairs generated using two references from the same genus) significantly improved the genome assembly of C. batrachus . Compared to the original in silico method (using a single reference from the same genus, ‘mag’:C. magur or ‘mac’: C. macrocephalus ), the optimized in silico method (using two reference from the same genus, ‘mag’ and ‘mac’) reduced misassemblies (mag*:23,519; mac*: 25,442 vs. mag-mac**: 14,535), and yielded a similar NGA50 (mag*: 74.5 Kb; mac*: 39.1 Kb vs. mag-mac**: 67.3 Kb) and a similar number of complete BUSCOs (mag**:2,871; mac*: 2,659 vs. mag-mac**: 2,788).
Compared to the original in silico method, optimized in silico method of generating conserved mate pairs using three reference genomes (two from the same genus ‘mag’, ‘mac’ and one from the same order ‘mel’) drastically decreased misassemblies (mag*:23,519; mac*: 25,442, mel*:18,552 vs. mag-mac-mel**:7,671), but did not increase the NGA50 (mag*: 74.5 Kb; mac*: 39.1 Kb, mel*: 8.2 Kb vs. mag-mac-mel**: 5.5 Kb) or complete BUSCOs (mag*:2,871; mac*: 2,659, mel*:1,756 vs. mag-mac-mel**: 1,618 ).
We compared the mate pairs generated using one reference genome (C. batrachus ) with the conserved mate pairs generated using two reference genomes (C. batrachus and C. macrocephalus ). We found that the extra mate pairs in the target genome generated using one reference were mostly inverted (45.76% to 47.21%), while the remaining mate pairs in the target genome either displayed length deviations or were mapped to different scaffolds of the target genome (Table S11).