Figure 4. ORFeome capture using LASSO probes. a, Schematic of the workflow. b, Post-capture PCR of circles obtained from the capture of 3,078 ORFs of E. coli K12 performed using the LASSO probe library. The inset is a histogram denoting the size distribution of the targeted ORFs split into bin sizes of 40 bp. Targeted ORFs have an increase in 140bp of residual LASSO sequences once captured and run on a gel. c. Median RPKM enrichment ratios of targeted ORFs versus non-targeted genetic elements ratios of a LASSO probe library obtained by using the DNA Recombinase Mediated Assembly (blue) and the assembly method developed by Tosi L. and coworkers in 2017 (red). d, Bee swarm plot combined with boxplot Average depth of sequencing per kilobase for each targeted ORF (n=3087) and non targeted ORF (n=905). Center lines show the medians; box limits indicate the 25th and 75th percentiles as determined by R software; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles, outliers are represented by dots. n = 3057, 1004 sample points. e.Normalized read depth of targeted ORFs as a function of the length of the ORF
Post capture PCR of circles obtained from the capture of 3078 ORFs of E.coli K12 was run in a 1.2% agarose gel and is shown in Fig. 4 b. and their apparent size distribution corresponded well with that of the targeted ORFs. Post capture PCR amplicon was enzymatically fragmented and sequenced on an Illumina NextSeq instrument to obtain 150 nucleotide paired end reads.
For reads mapping to the E. coli genome, we calculated target enrichment factors, which we defined as the reads per kilobase of genetic element per million reads (RPKM), which were mapped to the targeted ORFs versus non-targeted ORFs. Furthermore, RPKM targeted/non-targeted ratios were analyzed for different length genetic elements by binning Fig. 4 c In this experiment, LASSO targeted ORFs were enriched in all bins (up to ~250 × for ORFs < 1kb) representing 8 times improvement in comparison to enrichment previously measured by Tosi and coworkers (2017).
Fig. 4d. illustrates box plots of average depth of sequencing per kilobase for each targeted and for each untargeted ORF. The targeted ORFs were significantly enriched compared with the non-targeted ORFs (by Welch two-sample t-test). The mean and the median RPKM of the targets was 2476 and 264 for the targets respectively while the mean and the median RPKM of the Non Targets was 31 and 1.26 respectively. Fold-enrichment of targets was calculated to be between 60- and 200-fold (by the median or mean of the target RPKM, respectively, over the mean non-target RPKM). At a cutoff of three times the median non-target RPKM, around 70% of the targeted ORFs were successfully captured. The normalized abundance of each target ORF was negatively correlated with the ORF length; (Fig. 4e ). This length bias was previously reported (Tosi et al. 2017) and it reflects target length-dependent capture efficiency, post-capture PCR bias or a combination of the two effects.