Abstract
In the attempt to bridge the widening gap from DNA sequence to biological function, we developed a novel methodology to assemble Long-Adapter Single-Strand Oligonucleotide (LASSO) probe libraries that enabled the massively multiplexed capture of kilobase-sized DNA fragments for downstream long read DNA sequencing or expression. This method uses short DNA oligonucleotides (pre-LASSO probes) and a plasmid vector that supplies the backbone for the mature LASSO probe through Cre-Loxp intramolecular recombination. This strategy generates high quality LASSO probes libraries (~46% of probes). We performed NGS analysis of the post-capture PCR amplification of DNA circles obtained from the LASSO capture of 3087 E.coli ORFs spanning from 400- to 4,000 bp. The median enrichment of all targeted ORFs versus untargeted ORFs was 30 times. For ORFs up to 1kb in size, targeted ORFs were enriched up to a median of 260-fold. Here, we show that LASSO probes obtained in this manner, are able to capture full-length open reading frames from total human cDNA. Furthermore, we show that the LASSO capture specificity and sensitivity is sufficient for target capture from total human genomic DNA template. This technology can be used for the preparation of long-read sequencing libraries and for massively multiplexed cloning of human sequences.
1. Introduction
Advances in DNA sequencing have led to an exponential increase in the quantity of sequence data. Databases now contain the genomes of hundreds of plants and tens of thousands of microorganisms. Despite the availability of these data, there is still a gap in understanding the function of genes within a genome. Massively parallel technologies that enable the synthesis and cloning of long DNA sequences are thus important in linking sequence to function. [1]
The recent development of multiplexed functional assays allows for the rapid testing of thousands to millions of sequences across a wide array of biological functions. [2, 3, 4, 5, 6, 7, 8, 9]The DNA sequences of interest may be obtained by genome fragmentation[10], mutagenesis of existing sequences[11] or direct synthesis of oligonucleotides (oligos).[12] Direct oligo synthesis allows for testing of controlled hypotheses against one another without the constraints of natural variation or mutagenesis. However, individual oligos are generally shorter than 200 nucleotides (nt), limiting potential applications. [13] Gene synthesis from oligo libraries can be used to extend these lengths[14, 15], but the high cost of individual assembly and processing becomes prohibitive for large gene libraries.[16, 17] A number of alternative methods for multiplexed gene synthesis have demonstrated the assembly of hundreds to thousands of short fragments. [18, 19, 20]However, those methods are limited in achievable maximum gene length (<800 bp) and produce highly-biased libraries with constraints on sequence homology. [37]
For high throughput functional studies that involve natural DNA sequences of interest, a valuable alternative to de novo DNA chemical synthesis, is the selection of DNA targets from a natural source. PCR has been used for 30 years to select a DNA sequence of interest from a DNA template. [21] However, traditional PCR or multiplexed PCR are generally not feasible for massive parallel DNA target selection because of non-specific amplification caused by interaction between the primers. [22, 23, 24] A different approach to primer design and DNA target capture to enable greater specificity is the use of molecular inversion probes (MIP).[25] MIPs are short single-stranded DNA (ssDNA) molecules (~150 bp), that contains two annealing sites at the strand ends (the ligation and the extension arms), which are complementary to the target sequence. Upon hybridization of the MIP to its target, the 5’ and 3’ DNA ends become adjacent and available for an intramolecular ligation reaction. To adapt this method to perform exon capture in combination with next generation sequencing, a DNA polymerase can be used to ‘gap-fill’ between target-specific MIP sequences designed to flank a full or partial exon, before ligase-driven circularization, thereby capturing a copy of the intervening sequence.[26, 27, 28] Uncircularized species are digested by exonucleases to reduce background, and circularized species are PCR amplified via primers directed at the common linker. It has been shown that MIPs are capable of massive parallel enrichment of short genomic regions. [28, 29] However, MIPs are inefficient at capturing larger target sequences because of the short length of the linker region. Increasing the length of the MIPs linker has been shown to allow the capture of longer targets. [30, 31, 32] The scalability of this long linker method was limited though, due to a requirement of a separate PCR reaction for each individual probe.
To overcome the scalability limitation of producing longer MIPs, we developed a method that allows the production of thousands of targeted probes, that are essentially long-linker MIPs, in the same reaction tube. [33] The assembly method of these Long-Adapter Single-Strand Oligonucleotides (LASSOs) was based on the fusion of a probe precursor (pre-LASSO that contains the ligation and extension arms) with a conserved linker sequence (Long Adapter) by PCR. The fusion PCR amplicon was subsequently circularized by intramolecular ligation and subjected to inverse PCR, so that the LASSO annealing arms were made to flank the long-adapter sequence in the final configuration. Despite the functionality of a LASSO library in cloning a near complete bacterial ORFeome, in a later study we found that the LASSO library we used for the capture was composed only by ~10% correctly assembled probes while the rest of the probes were discordant (with arms that originate from different probes).[34]
The purity (number of sequence-correct probes) and quality (higher percentage of correctly assembled probes) of mature LASSO libraries can undoubtedly impact capture efficiency of targeted genome regions. For highly complex eukaryotic genomes, including human application, high purity of a mature LASSO probe library is likely a stringent requirement. To address these issues, we developed a different highly scalable approach to assemble LASSO probes based on a cloning and recombination strategy. The probe precursor pre-LASSO are obtained as a short (160bp) DNA oligo pool, which is incorporated into a custom plasmid pLASSO and transformed into E.coli. Site specific recombination of two loxP sites oriented in the same direction in pLASSO, produces the excision of a DNA mini-circle that contains the mature LASSO precursor already in the final configuration thus avoiding the formation of discordant probes of the previous method.
We found that the excision of the mini-circle was enhanced when we used the uncoiled form of pLASSO obtained by nicking as substrate for the recombination. This novel assembly strategy produces LASSO libraries with a much higher fraction of correctly assembled LASSO probes with a consequent improvement in the capture efficiency and pave the way for LASSO probe in human applications.