I. INTRODUCTION
Many key questions in evolutionary and conservation biology can only be
addressed using genomic approaches and appropriate study species. Lake
Trout (Salvelinus namaycush ) are a top predator in many lentic
ecosystems across northern North America and express exceptional levels
of ecotypic variation (Muir et al.
2014; Muir et al. 2016), making them an ideal study species for
exploring the processes of ecological speciation and adaptive
diversification. The post-Pleistocene parallel evolution of diverse Lake
Trout ecotypes has been likened to the adaptive radiation of cichlid
species in the Great Lakes of east Africa (Muir et al. 2016); however,
the radiation of Lake Trout ecotypes appears to have occurred over a
relatively short evolutionary timescale
(Harris et al. 2015,
~8000 years). At least three distinct Lake Trout
ecotypes (lean, siscowet, and humper) once existed throughout the
Laurentian Great Lakes (Hansen
1999) and anecdotal evidence suggests that as many as 10 easily
differentiable forms once existed in Lake Superior
(Goodier 1981). High levels of
ecotypic variation have also been documented in contemporary populations
across the species range (Blackie
et al. 2003; Zimmerman et al. 2006; Hansen et al. 2012; Chavarie et al.
2015), with as many as five trophic ecotypes being found in a single
lake (Marin et al. 2016).
Lake Trout are also ancestrally autotetraploid, with the common ancestor
of all salmonids having undergone a whole genome duplication event (WGD)
roughly 60-100 million years ago
(Crête-Lafrenière et al. 2012;
Macqueen and Johnston 2014). For this reason, Salmonids have long been
considered ideal study species for understanding the evolutionary
consequences of WGD (Ohno 1970;
Allendorf and Thorgaard 1984). Given the high levels of ecotypic
diversity observed in Lake Trout, and the potential for WGD to
facilitate the evolution of novel phenotypes
(Ohno 1970; Macqueen and Johnston
2014; Van De Peer et al. 2017) and reproductive isolation
(Lynch and Force 2000), research
exploring the genetic basis for ecotypic differentiation and incipient
speciation in Lake Trout could provide important insights about the role
of relatively recent WGD events in adaptive radiations.
Furthermore, many Lake Trout populations, particularly those in the
Laurentian Great Lakes, have been severely reduced in abundance or
distribution, or extirpated, due to invasive species introductions and
overfishing (Smith 1968).
Following the basin-wide collapse of the lake whitefish (Coregonus
clupeaformis ) commercial fishery in the Great Lakes during the early
20th century, fishing pressure was transferred to Lake Trout
populations, which partially contributed to population declines starting
in the 1930s (Hansen 1999). A
novel predator, the sea lamprey (Petromyzon marinus ), also
invaded the Great Lakes during this time, leading to further increases
in adult Lake Trout mortality and functional extirpation from all lakes
except Lake Superior and a small, isolated, population in Lake Huron
(Hansen 1999). The restoration program that commenced largely focused on
reducing sea lamprey predation, reducing fishing pressure, creating
aquatic refuges, and stocking juvenile Lake Trout from a diverse
collection of domesticated strains originating from multiple source
populations (Krueger et al. 1983;
Hansen 1999). Lake Trout populations in Lake Superior rebounded
relatively quickly; however, the re-emergence of natural reproduction in
other lakes was hindered by high levels of lamprey predation on adult
Lake Trout (Pycha et al. 1980),
predation on juveniles by invasive alewife (Madenjian et al. 2008),
reduced juvenile survival caused by thiamine deficiency (Fitzsimmons et
al. 2010), and potentially reduced hatching success associated with PCB
contamination (Mac and Edsall 1991). Today, Lake Superior populations
remain relatively stable and recruitment has been observed in lakes
Huron (Riley et al. 2007),
Michigan (Hanson et al. 2013), and Ontario
(Lantry 2015). Recent research
suggests that domesticated strains used for reintroduction have variable
fitness in contemporary Great Lakes environments
(Scribner et al. 2018; Larson et
al. 2021), and may be differentially contributing to recent recruitment,
however, the biological mechanisms that underly these differences in
fitness and recruitment remain unclear.
Genomic and transcriptomic approaches have been widely used to identify
loci associated with adaptive diversity and ecotypic divergence in
salmonids (Prince et al. 2017;
Veale and Russelo 2017; Willoughby et al. 2018; Rougeux et al. 2019).
This work has been partially driven by the publication of high-quality
genome assemblies and linkage maps for numerous salmonid species
(Gagnaire et al. 2013; Lien et al.
2016; Christensen et al. 2018a, Christensen et al. 2018b; Pearse et al.
2019; De-Kayne et al. 2020); however, genomic resources are notably
lacking for Lake Trout. An annotated, chromosome-anchored, genome
assembly is arguably the most valuable resource for advancing genomic
research on any species. A publicly available reference genome for Lake
Trout would eliminate many challenges associated with conducting
conservation-oriented genetic research aimed at restoring ecotypic
diversity and viable wild populations. Until recently, the assembly of
non-model eukaryotic genomes was prohibitively expensive,
computationally challenging, and required the collaborative efforts of
large genome consortia; however, the development of long-read (‘third
generation’) sequencing technologies has to some extent eliminated these
hurdles (Hotaling and Kelley 2020;
Whibley et al. 2021).
Long-read sequencing data can be useful for scaffolding and filling gaps
in existing, fragmented, short-read assemblies
(English et al. 2012). A number of
assembly algorithms also seek to assemble contigs directly from
long-read sequencing data (Falcon,
Chin et al. 2016;
Canu, Koren et al. 2017; wtdbg2,
Ruan and Li 2020) and recent work
suggests that this approach can be highly effective for assembling
chromosome-anchored salmonid genomes when combined with additional
scaffolding information (De Kayne
et al 2020; also see RefSeq: GCF_002021735.2).
Salmonid genomes are highly complex and relatively difficult to assemble
owing to ancestral autotetraploidy
(Maqueen and Johnston 2014) and
high repeat content (Lien et al
2016; De-Kayne et al. 2020; Kajitani et al 2014). Sequencing
low-diversity individuals from inbred lines or homozygous individuals
produced via chromosome set manipulations provides one route for
simplifying assembly in such species. Previous salmonid genome
assemblies have made use of doubled haploid individuals
(Lien et al. 2016; Christensen et
al. 2018b; Pearse et al. 2019) because these individuals are
theoretically homozygous at all loci (but see Lien et al. 2016).
However, it should be noted that the highly contiguous assembly produced
by DeKayne et al. (2020) for European Whitefish (Coregonus sp.
balchen ) was produced using data from an outbred, wild-caught
individual.
Here we present a chromosome-anchored reference genome for a double
haploid Lake Trout that was assembled using Pacific Bioscience long-read
sequencing data and scaffolded using a high-density linkage map
(Smith et al. 2020) and
genome-wide chromatin conformation capture followed by massively
parallel sequencing (Hi-C). We also produced a number of complementary
resources including a custom repeat library, an interpolated
recombination map, and a set of publicly available gene annotations in
order to facilitate additional research on this important species.
Additionally, we identify Lake Trout homeologs resulting from the
Salmonid specific autotetraploid event (Ss4R) and establish homologous
relationships with chromosomes from other salmonid species.