Materials and
Methods
Plant Material &
Greenhouse
In January 2020, dormant vegetative cuttings were collected from 100
poplar trees spanning much of the latitudinal distribution of the
contact zone between Populus trichocarpa and P .balsamifera (Figure 1, Table S1). Cuttings were then transported
to Blacksburg, VA where they were rooted in a greenhouse maintained at
day and night temperatures of 24 and 15.5 °C, respectively, with no
supplemental lighting. Following vegetative flush of rooted cuttings,
leaf tissue was sampled for DNA extraction using the Qiagen plant DNeasy
kit (Qiagen Inc., Valencia, CA), with minor modifications that included
a phenol-chloroform extraction in place of a QIAshredder column,
approximately 100 mg of leaf was used to extract DNA. Samples with low
DNA concentration were subject to a secondary extraction using a
modified CTAB protocol . Quality and concentration of DNA were assessed
using a combination of NanoDrop and Qubit approaches, in addition to gel
electrophoresis. DNA from these samples was used to quantify telomere
length using both qPCR and WGS based on Illumina short-read sequencing.
Genomic libraries and sequencing
Genomic libraries were prepared using the Illumina KAPA HyperPrep kit at
the Duke University Center for Genomic and Computational Biology.
Briefly, the libraries were sequenced on an S4 flow cell in 2 x 150 bp
format on an IlluminaNovaSeq 6000 with 64 samples per lane. Quality
control and preprocessing of raw reads were done using FastQC and
Trimmomatic . Sequences were trimmed to remove adapters, short reads
(< 30 bp), and reads were discarded with < 30 bp of
overlap between forward and reverse reads with a maximum mismatch value
of 4. Following trimming, 37.33 ± 6.5 million reads were retained with
an average GC length of 35.6% ± 0.5 and an average read length of 141
bp (Table S1). Trimmed reads were used in the downstream analyses.
Telomere length from whole-genome
sequence
data
Telomere length was estimated from WGS data using three different
bioinformatic approaches; Computel v1.3 , K-seek , and TRIP . All three
programs estimate telomere length using reads from whole genome sequence
data. However, the three programs use different parameters to estimate
telomere length allowing for comparison.
Computel (v1.3) creates a telomeric reference using read length, in
addition to telomeric repeat length and pattern . Telomeric reads are
those reads that align to the telomere reference developed within the
program. At least two replicates of the telomere sequence pattern were
required to be considered a telomeric read. Once telomeric reads are
identified, Computel calculates the relative genome coverage for
individuals as a ratio of telomeric coverage to genome coverage.
Telomeric coverage is defined as the distribution of coverage per base
for the telomeric reference and genome coverage is the product of the
total number of reads and read length, divided by total genome size.
After estimating relative genome coverage, Computel estimates telomere
length by counting telomeric reads and normalizing counts relative
genome coverage. Computel then divides telomere length by the number of
chromosomes present in the species to obtain mean telomere length per
individual. For the purposes of our study, we modified Computel
parameters to consider Populus genome (P. trichocarpa v3,
NC_037285.1. ) parameters, including chromosome number (n=19), genome
size (434,290,000 bp), and telomere sequence pattern (TTTAGGG).
In addition to Computel, we used the bioinformatics pipeline K-seek to
identify and quantify reads that contained telomeric repeats. K-seek
uses a pattern-matching approach that does not consider genome coverage
in telomere estimates. K-seek identifies sequence repeats ranging
between 1 to 20 bp . Reads with a minimum repeat length of 50 bp per
repeat are considered to avoid counting short repeat sequences scattered
across the genome. After identifying and counting sequence repeats,
K-seek organizes the repeat sequences in alphabetical order and size (in
bp). To quantify telomeric repeat sequences, we used the canonical
telomere repeat sequence specific to plants, i.e., AAACCCT , which is
the reverse complement of TTTAGGG (Figure S2). To calculate mean
telomere length per individual in base pairs, we applied the following
equation:
Equation (1) Mean telomere length per individual\(=\ \frac{\text{FTL\ x\ RL}}{2\ x\ Chr}\)
Where, FTL is the frequency of telomeric repeats, RL is
the telomeric repeat length (7 base pairs for TTTAGGG), two in the
denominator accounts for the two ends of a chromosome and Chr is
the number of chromosomes in Populus (n=19).
Finally, we used the telomeric repeats identification pipeline (TRIP) to
estimate telomere length across Populus genotypes. Like K-seek,
TRIP uses a pattern-matching approach that does not include genome
coverage into telomere estimates. However, unlike K-Seek only reads with
a minimum of four repeat sequences are considered . TRIP searches for
repeat sequences of lengths ranging from 2 to 25 bp with no mismatches
permitted. Once repeat sequences are identified, TRIP reorders the
repeat sequences alphabetically and summarizes the repeat regions into
count tables. The frequency of the canonical telomeric repeat sequence
(AAACCCT) was used to estimate mean telomere length of an individual.
Telomere length was calculated using the same equation and parameters
applied with K-Seek. For subsequent analyses, all individuals were
included except one which exhibited low genome coverage (12.31x) from
the WGS data (Table S1).
A major difference between Computel and the two other bioinformatic
approaches is the inclusion of genome coverage into telomere length
estimations. Variability in genome coverage may impact telomere length
estimations. Therefore, we normalized telomere length estimates by
individual genome coverage using Equation 2. Telomere length values for
K-seek and TRIP were re-assessed because the two programs did not
previously consider genome coverage.
Equation (2) Corrected telomere length\(=\ \frac{\text{FTL\ x\ RL}}{2\ x\ Chr}/\ Genome\ coverage\)
Genome coverage was calculated as the product of the total number of
reads and read length, divided by total genome size for each individual.
In the case of Populus , we use a genome size of 434,290,000 bp
(P. trichocarpa v3, NC_037285.1, ).
Quantitative PCR (qPCR) telomere
length assay.
Quantitative polymerase chain reaction (qPCR) was used to quantify
relative telomere length (rTL) using a Mx3000P QPCR System , following
the methods of modified for Populus . qPCR quantifies the rTL for
each individual as the ratio of the telomere repeat copy number to a
single-copy control reference gene, relative to the reference sample .
Plant-specific telomere primers were developed to assay telomere length
alongside a housekeeping gene (glyceraldehyde-3-phosphate dehydrogenase;
GAPDH), which is a single-copy control reference gene that exhibits
limited sequence variability within the same individual.
qPCR reactions for the housekeeping GAPDH gene and telomere primers were
run in triplicate on separate plates. Each qPCR reaction contained 20 ng
of DNA per reaction in addition to 12.5 μl of SYBER Green Master Mix
(Quantabio), 0.25 μl forward and reverse primer, 6 μl ultrapure water,
and 6 μl of DNA. The qPCR cycle conditions for the GAPDH gene were 10
min at 95°C, followed by 33 cycles of 15 s at 95°C, 30 s at 59 °C and,
30 s at 72°C. qPCR conditions for the telomere region were 10 min at
95°C, followed by 20 cycles of 15 s at 95°C, 30 s at 58°C and 30 s at
72°C. Both GADPH and telomere assays were run on the same day, withPopulus samples randomly assigned to a plate and three technical
replicates per individual for each plate.
To calculate relative
telomere length (rTL), we used the following formula:\(2^{-\Delta\Delta Ct}\) , where\(\Delta\Delta Ct\ =\ \left(\text{Ct}^{\text{telomere}}\ -\ \text{Ct}^{\text{GAPDH}}\ \right)\ focal\ sample\ -\ \left(\text{Ct}^{\text{telomere}}\ -\text{Ct}^{\text{GAPDH}}\right)\text{\ reference\ sample\ }\).Ct values indicate the number of PCR cycles required for the
fluorescent signal to cross the threshold which is defined as the
transition from linear phase to exponential phase of amplification for
telomere and GAPDH reactions, respectively. Ct values are
inversely proportional to the starting amount of the target sequence,
samples with longer telomeres required fewer cycles to cross the
threshold and thus had lower Ct -values. To standardize rTL across
samples and plates, one reference Populus sample was run in
triplicate on every plate.
For each plate, a serial dilution of DNA for a single Populussample was prepared to calculate reaction efficiencies. To ensure that
the DNA samples fell within the standard curve, we ran a five-point
standard curve (40, 20, 10, 5, and 2.5 ng/mL of DNA) across all plates.
Each point in the standard curve was run in triplicate. The reaction
efficiencies for the telomere plates and GAPDH plates were within the
accepted range (i.e., 100 ± 15%), with an average of 93.54 ± 3.82 %
for the telomere plates and 88.78 ± 3.88 % for GAPDH plates.
Additionally, we included a negative control of 6 μl of water run in
triplicate on each plate, which did not produce Ct values.
Statistical analyses