Abstract
In the attempt to bridge the widening gap from DNA sequence to
biological function, we developed a novel methodology to assemble
Long-Adapter Single-Strand Oligonucleotide (LASSO) probe libraries that
enabled the massively multiplexed capture of kilobase-sized DNA
fragments for downstream long read DNA sequencing or expression. This
method uses short DNA oligonucleotides (pre-LASSO probes) and a plasmid
vector that supplies the backbone for the mature LASSO probe through
Cre-Loxp intramolecular recombination. This strategy generates high
quality LASSO probes libraries (~46% of probes). We
performed NGS analysis of the post-capture PCR amplification of DNA
circles obtained from the LASSO capture of 3087 E.coli ORFs spanning
from 400- to 4,000 bp. The median enrichment of all targeted ORFs versus
untargeted ORFs was 30 times. For ORFs up to 1kb in size, targeted ORFs
were enriched up to a median of 260-fold. Here, we show that LASSO
probes obtained in this manner, are able to capture full-length open
reading frames from total human cDNA. Furthermore, we show that the
LASSO capture specificity and sensitivity is sufficient for target
capture from total human genomic DNA template. This technology can be
used for the preparation of long-read sequencing libraries and for
massively multiplexed cloning of human sequences.
1. Introduction
Advances in DNA sequencing have led to an exponential increase in the
quantity of sequence data. Databases now contain the genomes of hundreds
of plants and tens of thousands of microorganisms. Despite the
availability of these data, there is still a gap in understanding the
function of genes within a genome. Massively parallel technologies that
enable the synthesis and cloning of long DNA sequences are thus
important in linking sequence to function. [1]
The recent development of multiplexed functional assays allows for the
rapid testing of thousands to millions of sequences across a wide array
of biological functions. [2, 3, 4, 5, 6, 7, 8, 9]The DNA sequences of interest may be obtained by genome fragmentation[10], mutagenesis of existing sequences[11] or direct synthesis of oligonucleotides
(oligos).[12] Direct oligo synthesis allows for
testing of controlled hypotheses against one another without the
constraints of natural variation or mutagenesis. However, individual
oligos are generally shorter than 200 nucleotides (nt), limiting
potential applications. [13] Gene synthesis from
oligo libraries can be used to extend these lengths[14, 15], but the high cost of individual assembly
and processing becomes prohibitive for large gene libraries.[16, 17] A number of alternative methods for
multiplexed gene synthesis have demonstrated the assembly of hundreds to
thousands of short fragments. [18, 19, 20]However, those methods are limited in achievable maximum gene length
(<800 bp) and produce highly-biased libraries with constraints
on sequence homology. [37]
For high throughput functional studies that involve natural DNA
sequences of interest, a valuable alternative to de novo DNA chemical
synthesis, is the selection of DNA targets from a natural source. PCR
has been used for 30 years to select a DNA sequence of interest from a
DNA template. [21] However, traditional PCR or
multiplexed PCR are generally not feasible for massive parallel DNA
target selection because of non-specific amplification caused by
interaction between the primers. [22, 23, 24] A
different approach to primer design and DNA target capture to enable
greater specificity is the use of molecular inversion probes (MIP).[25] MIPs are short single-stranded DNA (ssDNA)
molecules (~150 bp), that contains two annealing sites
at the strand ends (the ligation and the extension arms), which are
complementary to the target sequence. Upon hybridization of the MIP to
its target, the 5’ and 3’ DNA ends become adjacent and available for an
intramolecular ligation reaction. To adapt this method to perform exon
capture in combination with next generation sequencing, a DNA polymerase
can be used to ‘gap-fill’ between target-specific MIP sequences designed
to flank a full or partial exon, before ligase-driven circularization,
thereby capturing a copy of the intervening sequence.[26, 27, 28] Uncircularized species are digested
by exonucleases to reduce background, and circularized species are PCR
amplified via primers directed at the common linker. It has been shown
that MIPs are capable of massive parallel enrichment of short genomic
regions. [28, 29] However, MIPs are inefficient at
capturing larger target sequences because of the short length of the
linker region. Increasing the length of the MIPs linker has been shown
to allow the capture of longer targets. [30, 31,
32] The scalability of this long linker method was limited though,
due to a requirement of a separate PCR reaction for each individual
probe.
To overcome the scalability limitation of producing longer MIPs, we
developed a method that allows the production of thousands of targeted
probes, that are essentially long-linker MIPs, in the same reaction
tube. [33] The assembly method of these
Long-Adapter Single-Strand Oligonucleotides (LASSOs) was based on the
fusion of a probe precursor (pre-LASSO that contains the ligation and
extension arms) with a conserved linker sequence (Long Adapter) by PCR.
The fusion PCR amplicon was subsequently circularized by intramolecular
ligation and subjected to inverse PCR, so that the LASSO annealing arms
were made to flank the long-adapter sequence in the final configuration.
Despite the functionality of a LASSO library in cloning a near complete
bacterial ORFeome, in a later study we found that the LASSO library we
used for the capture was composed only by ~10%
correctly assembled probes while the rest of the probes were discordant
(with arms that originate from different probes).[34]
The purity (number of sequence-correct probes) and quality (higher
percentage of correctly assembled probes) of mature LASSO libraries can
undoubtedly impact capture efficiency of targeted genome regions. For
highly complex eukaryotic genomes, including human application, high
purity of a mature LASSO probe library is likely a stringent
requirement. To address these issues, we developed a different highly
scalable approach to assemble LASSO probes based on a cloning and
recombination strategy. The probe precursor pre-LASSO are obtained as a
short (160bp) DNA oligo pool, which is incorporated into a custom
plasmid pLASSO and transformed into E.coli. Site specific recombination
of two loxP sites oriented in the same direction in pLASSO, produces the
excision of a DNA mini-circle that contains the mature LASSO precursor
already in the final configuration thus avoiding the formation of
discordant probes of the previous method.
We found that the excision of the mini-circle was enhanced when we used
the uncoiled form of pLASSO obtained by nicking as substrate for the
recombination. This novel assembly strategy produces LASSO libraries
with a much higher fraction of correctly assembled LASSO probes with a
consequent improvement in the capture efficiency and pave the way for
LASSO probe in human applications.