2 STRATEGIES TO OBTAIN MINIMAL ANNOTATION FOR UNKNOWN PROTEIN OFB. SUBTILIS
As outlined above, some information is required to start the
identification of unknown proteins. In the past years, such starting
information has been obtained for many unknown proteins. A major
breakthrough in this respect was the global transcriptome analysis of
this organism under 104 different conditions (Nicolas et al., 2012).
Based on this analysis, 180 proteins could be newly categorized as
sporulation proteins (Pedreira et al., 2022b).
Interaction studies might provide more insights into the function of
unknown proteins – based on the principle “guilty by association”.
Indeed, a large scale, global interactome analysis with the minimal
organism Mycoplasma pneumoniae provided important clues for the
function of many proteins. As an example, the completely unknown protein
MPN530 interacts with multiple subunits of the RNA polymerase, including
the house-keeping sigma factor, suggesting that this protein plays a
role in transcription (O’Reilly et al., 2020; Elfmann et al., 2022). A
similar attempt has recently also been performed for B. subtilis ,
and many so far unknown proteins were found interacting with
well-characterized proteins or protein complexes, such as the ribosome
(O’Reilly et al., 2023). Similarly, global approaches to identify
protein-RNA or protein-metabolite complexes have been established;
however, they have not yet been applied to B. subtilis (Chihara
et al., 2022; Link et al., 2013). In addition to global analyses,
protein-specific interaction studies can also provide important
functional information about previously unknown proteins. This is the
case for the c-di-AMP binding DarB protein, which was shown to interact
with and thereby to control the activity of the (p)ppGpp synthetase/
hydrolase Rel and the pyruvate carboxylase PycA in B. subtilis(Krüger et al., 2021a; Krüger et al., 2022).
Phenotypic profiling is another way to get at least some initial
information for unknown proteins. Since the experimental study of mutant
phenotypes under a variety of experimental conditions often fails to
provide clues (see above), global analyses of mutant properties might be
an important way to obtain initial information on unknown proteins. For
example, transcriptome or proteome profiling of changes in a mutant and
a comparison to established catalogues may yield functional insights.
Such an approach has already been successfully used to identify the mode
of action of unknown antibacterial compounds in B. subtilis(Senges et al., 2021; Senges et al., 2022). Proteomic profiling may
prove to be especially useful for the functional analysis of unknown
ribosome-interacting proteins, since extensive catalogued information of
proteome changes under conditions of ribosome perturbation is available
for B. subtilis (Senges et al., 2021).
While the strategies described above all depend on a gene- or
protein-based investigation, the analysis of poorly studied functions
may also provide novel functional information. While many areas of
research have been exhaustively addressed in this model bacterium, there
still remain some functions that are uncharted territories of research.
Among these functions are RNA-binding proteins, the transport of amino
acids, the control of metabolic homeostasis (see below), some activities
in cell wall biosynthesis, or the detoxification of toxic metabolites
(Reuss et al., 2016). The purposeful investigation of these fields aimed
at identifying the responsible proteins indeed uncovers functional
information for previously unknown proteins. This is the case for the
undecaprenyl-phosphate transporter UptA, which was very recently
identified in a well-designed transposon mutagenesis screen (Roney &
Rudner, 2022). This was the last missing piece in the part list of cell
envelope biogenesis in B. subtilis . Similarly, several
complementary approaches aimed at the identification of a protein
responsible for serine uptake pointed to the so far unknown protein
YbeC. Subsequently, YbeC was also shown to be the major glutamate
transporter of B. subtilis. Accordingly, the protein was renamed
AimA for amino acid importer A (Klewing et al., 2020; Krüger et al.,
2021b). The investigation of proteins involved in metabolite damage
control revealed that the YqeK protein may act in the degradation of
toxic side-products of the nicotinamide-nucleotide adenylyltransferase
NadD by virtue of its versatile diphosphatase activity (Haas et al.,
2022). Other examples for toxic by-products of metabolism are
4-phosphoerythronate, a by-product of erythrose-4-phosphate oxidation in
the pentose phosphate pathway that inhibits the phosphogluconate
dehydrogenase GndA, and 5-oxoproline, an unavoidable damage product
formed spontaneously from glutamine. These harmful metabolites are
detoxified by the GTPase CpgA, which moonlights in dephosphorylation of
4-phosphoerythronate, and by the 5-oxoprolinase PxpABC, respectively
(Sachla & Helmann, 2019; Niehaus et al., 2017). It has recently become
obvious that the prevention of metabolite damage is very important for
the viability of any living cell. The limited knowledge on these
mechanisms is an important bottleneck in all genome reduction projects
(Reuß et al., 2017).
A final prerequisite for the identification of unknown functions is the
integration of all possible available information. Even small pieces of
information that by themselves may not prove to be very useful can help
to get a deeper understanding if brought into an appropriate context.
This annotation information can cover expression data, interaction data,
gene regulation, the control of protein activities, localization data
and many other types of data (see Fig. 1). An example for the value of
data integration is one of the very few remaining unknown essential
proteins, YlaN. This gene is essential under standard conditions
(complex medium) but becomes dispensable if iron is added to the medium,
suggesting a role in the control of iron homeostasis (Peters et al.,
2016). In addition, two independent studies identified a physical
interaction of the protein with the key regulator of iron homeostasis,
Fur (de Jong et al., 2021; O’Reilly et al., 2023). Bringing this
information together immediately supports the idea that the interaction
is meaningful and that YlaN might control the activity of the Fur
regulator. This hypothesis may then be validated in specifically
designed experiments. Such an integration of all available information
on the genes and proteins is provided in the database Subti Wiki
which is the major reference tool of the B. subtilis research
community (Fig. 1; Pedreira et al., 2022). Other important online tools
that help developing hypotheses are the protein interaction and
association database STRING (Sklarczyk et al., 2023), the FlaGs
webserver that allows to interrogate conserved gene organization which
is often an indication of functional association between proteins (Saha
et al. 2021), as well as the UniProt and COG databases that provide
information on protein functions that may have been identified in other,
otherwise potentially overlooked (Galperin et al., 2021; UniProt
Consortium, 2023).