2.3 | Generating in silico mate-pair libraries
using the original pipeline
Multiple sets of in silico mate pairs were generated using the
original in silico pipeline “cross-mates” (Fig. 2); (Grau et
al., 2018). First, reads of the target organism were mapped onto the
repeat-masked reference genome using BWA-MEM (Li, 2013) and default
settings. A consensus was then computed using samtools/bcftools with the
samtools legacy variant calling model (Li, 2011). Read pairs (mate
pairs) were sampled from the consensus in systematic mode, that is,
using exact insert sizes and sampling fragments at regularly spaced
offsets, and skipping regions of coverage lower than three. For the test
assemblies, in silico mate pairs were generated with at least 30x
coverage each, with multiple insert sizes ranging from 500 bp to 200 Kb
(500 bp, 1 Kb, 1.5 kb, 2 Kb, 5 Kb, 10 Kb, 20 Kb, 50 Kb, 100 Kb, 200 Kb).
The in silico mate pairs generated using reference genomes from
different grades of taxonomy were named as ‘species name*’.