Figure 2: Strategies used to find optimal environmental conditions. (a) Schematic showing five environmental components that can be combined (1-present, 0-absent) to design an optimal environment that maximizes a target microbial community function. (b) Map between environmental composition and function. The diagram shows all 32 (25) combinations that can be constructed from five environmental components. Color gradient from white to purple indicates increasing function level.(c-f) We show different strategies used to design the environment that maximizes a function of interest. (c) Genetic Algorithms (GA) begin by empirically quantifying the function (‘fitness’) in a subset of environments selected at random. Environments are then ranked by decreasing function. Environments mapping to highest function values are used to design a new ‘population’ of environments. Top environments are carried over to the new population and more variation is added by recombining and mutating those top environments. A new round of function quantification and selection of top environments begins. The whole process can be repeated until the function no longer increases or for a fixed number of predefined generations. (d)In fractional factorial design, a carefully selected subset of environments is used to quantify function. The results are then used to build regression models which allow predicting the function of out-of-sample environments. (e) Full factorial design consists in quantifying function in every possible environmental combination, therefore revealing the full composition-function map. This method is restricted to environments made up of few components as the number of environments increases exponentially with the number of environmental components. (f) Patterns of global epistasis observed in genetics allow predicting the effect a mutation has on organismal fitness, knowing the fitness of the background where the mutation occurred (see (Diaz-Colunga et al., 2023; Johnson et al., 2023) and references therein) and. An ecological parallel to global epistasis has also been demonstrated, where the effect of a species addition on a community-level function is predictable from the function of the original community without the species (Diaz-Colunga et al., 2023; Ruiz et al., 2023; Sanchez et al., 2023). Here we propose that global patterns akin to global epistasis may also allow us to predict the change in function induced by the addition of a new component to an environment. If these patterns did exist, this would open up the opportunity for a new method for predicting function.
Genetic algorithms can find optimal environments that maximize desirable traits of clonal and multi-genotype communities. One approach to identify optimal combinations of environmental factors is the use of genetic algorithms (GA) to explore their combinatorial functional landscape (Fig. 2c) (Kucharzyk et al., 2012; Pacheco et al., 2021; Vandecasteele et al., 2008). As an example, Kucharzyk et al demonstrated the utility of a generational GA for identifying optimal environmental conditions for the degradation of perchlorate in both enrichment communities and in pure cultures ofDechlorosoma sp. Strains (Kucharzyk et al., 2012). In this study, the authors formed a multidimensional functional landscape including environmental variables such as the pH, salinity ([KCl]), buffering capacity ([NaH2PO4] and [NaHCO3]), concentration of electron donor (acetate) and electron acceptor (the perchlorate itself), and the concentrations of other microbial nutrients such as vitamins and trace minerals. Environments were represented by a 9-string vector, containing in each entry the concentration of each variable, and the fitness of each individual environment (i.e. its probability to be selected for reproduction in the next generation) was given by the perchlorate degradation rate, which was determined empirically. The strings that made up the next generation were derived by recombination (via uniform crossover with probability 0.5) from those that were selected. In the daughter strings, each environmental variable could also randomly “mutate”, i.e. increase or decrease by a magnitude equal to a pre-determined step size. The mutation rate was set so that, on average, one variable would change per individual per generation. The GA was applied for 11 generations, and its outcome was remarkably successful. The authors found environmental conditions that increased perchlorate degradation by over 16-fold for individual strains and over 5-fold for the consortia.
This work followed previous attempts to use genetic algorithms to optimize community functions through environmental manipulations. In the earliest such study we are aware of, Vandecasteele et al used a microbial community derived from human saliva as an inoculum, and built a genetic algorithm to optimize a collective function consisting of azo dye decoloration (Vandecasteele et al., 2008). Rather than manipulating the concentration of a set of resources, the environmental space that was explored in this work was the presence or absence of 10 different chemical supplements, including nutrients (i.e. glucose or glycerol) as well as various buffers, acids, and bases. Each combination of chemicals was added to an 12.5x dilution of a saliva sample and the fitness of the environment was given by the amount of dye decoloration over 24hr of culture. Dye decoloration increased after 15 optimization steps, and the authors convincingly demonstrated that besides the average increase in the metapopulation, the function of the best environments in each generation also responded to selection (an important benchmark in artificial community-level selection experiments (Chang et al., 2021)). A strength of this study is that the authors examined the ability of their approach to find a better ecosystem than the one they started with.
An exciting prospect for manipulating microbial community functions through their metabolic environment consists of combining evolutionary algorithms with genome-scale metabolic models. In pioneering work (Harcombe et al., 2014; Pacheco and Segrè, 2021), Pacheco and Segre used dynamical flux balance analysis (Dukovski et al., 2021) to model the growth and other metabolic properties of in silico microbial communities. Using this computational platform, they were able to combinatorially generate thousands of different environments, each containing combinations of up to 20 different resources, which were then inoculated with the same microbial consortium consisting of 13 microorganisms. The same approach was then expanded to a larger combinatorial space of over 150 limiting carbon sources. Similar to previous work, the authors implemented a genetic algorithm where a subset of environments were selected with a fitness score determined by their proximity to a community-level objective function. The selected environments were then used to generate the next generation through cross-over recombination and mutation (addition of new metabolites or removal of existing ones). Among the objective functions, the authors included compositional metrics such as community evenness, the abundance of target bacteria, as well as metabolic traits such as the secretion of particular metabolites or the degree of metabolic exchange among coexisting species (Pacheco and Segrè, 2021).
Altogether, these experiments support the feasibility and promise of evolutionary engineering approaches to explore the combinatorial space of environmental factors, in search for optimal culture conditions for microbial consortia. Importantly, these approaches do not require us to have a mechanistic understanding of environmental interactions. As we will see, however, bottom-up models that are built up from these interactions can be promising too.
Learning the landscape of environmental effects to infer optimal habitats (e.g. diets) for microbial communities. An alternative approach to finding optimal environments is to statistically learn the relationship between environment and community function for a given inoculum community. This is essentially the reverse problem to that of inferring the relationship between composition and function in a given environment (Eng and Borenstein, 2019; Sanchez et al., 2023). Because both problems are so similar, they are plagued by the same problems (chiefly, the presence of interactions between components, which we overview in previous sections), and the approaches that have been followed to solve them are similar also (Eng and Borenstein, 2019). Due to their combinatorial nature, a full factorial assessment of environmental factors (Fig. 2d-e ) has been challenging to execute experimentally. Most studies have instead focused on fractional factorial design (Fig. 2d ), where a subset of all possible environmental combinations is used to train a statistical model (Chen et al., 2009; Jiménez et al., 2014; Kikot et al., 2010; Skonieczny and Yargeau, 2009; Zhou et al., 2023). Typically, these models consisted of linear regression to either the presence/absence or the magnitude of different environmental factors, with the occasional inclusion of interaction terms. Once a statistical model of the community function landscape is available, it can be used to predict out of sample and thus locate environments that optimize the target function.
This approach was employed successfully by various authors. In the aforementioned work by Jimenez et al, for instance the authors were able to successfully predict the effect of the concentrations of three input substrates in the methanogenic activity of an anaerobic digester, thus identifying an optimal operation point (Jiménez et al., 2014). The other studies employing this strategy also focused on small combinatorial spaces with up to six different variables. Ideally, one would like to extend the approach to larger combinatorial spaces, but as the dimensionality increases so does the number of potential interactions, and therefore the number of measurements one must make to estimate their effect. A potential approach to handle this combinatorial explosion of interactions is to, once again, draw inspiration from genetics (Sanchez et al., 2023). In genetics, it has been found that simple quantitative patterns often emerge from myriad microscopic interactions between genes, allowing us to predict the fitness effect of a mutation without having to first parameterize its pairwise and higher-order epistatic interactions (see (Diaz-Colunga et al., 2023; Johnson et al., 2023) and references therein). We have recently shown that this idea extends as well to ecological communities, so that the functional effect of a species often follows simple quantitative patterns that do not require one to parameterize all possible interactions with every other member of the consortia (Diaz-Colunga et al., 2023; Ruiz et al., 2023; Sanchez et al., 2023). Future work will have to determine whether these global epistasis - like patterns also describe the effects of environmental factors in different contexts (Fig. 2f ).