2 STRATEGIES TO OBTAIN MINIMAL ANNOTATION FOR UNKNOWN PROTEIN OFB. SUBTILIS
As outlined above, some information is required to start the identification of unknown proteins. In the past years, such starting information has been obtained for many unknown proteins. A major breakthrough in this respect was the global transcriptome analysis of this organism under 104 different conditions (Nicolas et al., 2012). Based on this analysis, 180 proteins could be newly categorized as sporulation proteins (Pedreira et al., 2022b).
Interaction studies might provide more insights into the function of unknown proteins – based on the principle “guilty by association”. Indeed, a large scale, global interactome analysis with the minimal organism Mycoplasma pneumoniae provided important clues for the function of many proteins. As an example, the completely unknown protein MPN530 interacts with multiple subunits of the RNA polymerase, including the house-keeping sigma factor, suggesting that this protein plays a role in transcription (O’Reilly et al., 2020; Elfmann et al., 2022). A similar attempt has recently also been performed for B. subtilis , and many so far unknown proteins were found interacting with well-characterized proteins or protein complexes, such as the ribosome (O’Reilly et al., 2023). Similarly, global approaches to identify protein-RNA or protein-metabolite complexes have been established; however, they have not yet been applied to B. subtilis (Chihara et al., 2022; Link et al., 2013). In addition to global analyses, protein-specific interaction studies can also provide important functional information about previously unknown proteins. This is the case for the c-di-AMP binding DarB protein, which was shown to interact with and thereby to control the activity of the (p)ppGpp synthetase/ hydrolase Rel and the pyruvate carboxylase PycA in B. subtilis(Krüger et al., 2021a; Krüger et al., 2022).
Phenotypic profiling is another way to get at least some initial information for unknown proteins. Since the experimental study of mutant phenotypes under a variety of experimental conditions often fails to provide clues (see above), global analyses of mutant properties might be an important way to obtain initial information on unknown proteins. For example, transcriptome or proteome profiling of changes in a mutant and a comparison to established catalogues may yield functional insights. Such an approach has already been successfully used to identify the mode of action of unknown antibacterial compounds in B. subtilis(Senges et al., 2021; Senges et al., 2022). Proteomic profiling may prove to be especially useful for the functional analysis of unknown ribosome-interacting proteins, since extensive catalogued information of proteome changes under conditions of ribosome perturbation is available for B. subtilis (Senges et al., 2021).
While the strategies described above all depend on a gene- or protein-based investigation, the analysis of poorly studied functions may also provide novel functional information. While many areas of research have been exhaustively addressed in this model bacterium, there still remain some functions that are uncharted territories of research. Among these functions are RNA-binding proteins, the transport of amino acids, the control of metabolic homeostasis (see below), some activities in cell wall biosynthesis, or the detoxification of toxic metabolites (Reuss et al., 2016). The purposeful investigation of these fields aimed at identifying the responsible proteins indeed uncovers functional information for previously unknown proteins. This is the case for the undecaprenyl-phosphate transporter UptA, which was very recently identified in a well-designed transposon mutagenesis screen (Roney & Rudner, 2022). This was the last missing piece in the part list of cell envelope biogenesis in B. subtilis . Similarly, several complementary approaches aimed at the identification of a protein responsible for serine uptake pointed to the so far unknown protein YbeC. Subsequently, YbeC was also shown to be the major glutamate transporter of B. subtilis. Accordingly, the protein was renamed AimA for amino acid importer A (Klewing et al., 2020; Krüger et al., 2021b). The investigation of proteins involved in metabolite damage control revealed that the YqeK protein may act in the degradation of toxic side-products of the nicotinamide-nucleotide adenylyltransferase NadD by virtue of its versatile diphosphatase activity (Haas et al., 2022). Other examples for toxic by-products of metabolism are 4-phosphoerythronate, a by-product of erythrose-4-phosphate oxidation in the pentose phosphate pathway that inhibits the phosphogluconate dehydrogenase GndA, and 5-oxoproline, an unavoidable damage product formed spontaneously from glutamine. These harmful metabolites are detoxified by the GTPase CpgA, which moonlights in dephosphorylation of 4-phosphoerythronate, and by the 5-oxoprolinase PxpABC, respectively (Sachla & Helmann, 2019; Niehaus et al., 2017). It has recently become obvious that the prevention of metabolite damage is very important for the viability of any living cell. The limited knowledge on these mechanisms is an important bottleneck in all genome reduction projects (Reuß et al., 2017).
A final prerequisite for the identification of unknown functions is the integration of all possible available information. Even small pieces of information that by themselves may not prove to be very useful can help to get a deeper understanding if brought into an appropriate context. This annotation information can cover expression data, interaction data, gene regulation, the control of protein activities, localization data and many other types of data (see Fig. 1). An example for the value of data integration is one of the very few remaining unknown essential proteins, YlaN. This gene is essential under standard conditions (complex medium) but becomes dispensable if iron is added to the medium, suggesting a role in the control of iron homeostasis (Peters et al., 2016). In addition, two independent studies identified a physical interaction of the protein with the key regulator of iron homeostasis, Fur (de Jong et al., 2021; O’Reilly et al., 2023). Bringing this information together immediately supports the idea that the interaction is meaningful and that YlaN might control the activity of the Fur regulator. This hypothesis may then be validated in specifically designed experiments. Such an integration of all available information on the genes and proteins is provided in the database Subti Wiki which is the major reference tool of the B. subtilis research community (Fig. 1; Pedreira et al., 2022). Other important online tools that help developing hypotheses are the protein interaction and association database STRING (Sklarczyk et al., 2023), the FlaGs webserver that allows to interrogate conserved gene organization which is often an indication of functional association between proteins (Saha et al. 2021), as well as the UniProt and COG databases that provide information on protein functions that may have been identified in other, otherwise potentially overlooked (Galperin et al., 2021; UniProt Consortium, 2023).