3 A SET OF HIGHLY EXPRESSED UNKNOWN PROTEINS IN B. SUBTILIS
A recent proteomic study with B. subtilis revealed that essential proteins are highly overrepresented in the proteome (Reuß et al., 2017). Although they account only for only 6% of all B. subtilis genes, the essential genes use 57% of the translation capacity. This reflects the importance of these proteins for the cell. On the other hand, unknown and poorly studied proteins, which correspond to 25% of the genes, only use 3% of the translation capacity under standard conditions. It is very likely that many of the unknown proteins are only needed under very special conditions. This special-purpose demand combined with the low expression also explains why the function of such proteins has never been identified. On the other hand, there is a set of 41 proteins of unknown function that are highly expressed under most conditions (Table 1). Among these are the two essential proteins YlaN and YneF. It is tempting to speculate that these highly expressed unknown proteins are important for the physiology of B. subtiliseven under standard conditions. Interestingly, there are three pairs of paralogous proteins on the list (YqhY/ YloU, YabR/ YugI, YtxH/ YhaH). Moreover, several of the proteins have been detected in a recent global in vivo interaction study with B. subtilis (see Table 1). The complete set of highly expressed unknown B. subtilis proteins can be easily accessed in the database SubtiWiki (http://www.subtiwiki.uni-goettingen.de/v4/category?id=SW.6.7; Pedreira et al., 2022b; see Fig. 1B).
Of the 41 proteins, several have or may have RNA-binding activity, among them the most strongly expressed and highly conserved unknown protein YqeY. This protein contains a domain also present in the GatB subunit of the glutamyl-tRNA amidotransferase subunit suggesting that YqeY might also have tRNA amino acid amidase activity. The YtpR protein possesses an tRNA-binding domain that is also present in the beta subunit of the phenylalanine tRNA synthetase. YtpR physically interacts with GatB (O’Reilly et al., 2023) suggesting that it might facilitate the interaction between the Glu-loaded tRNA and the glutamyl-tRNA amidotransferase GatABC. The paralogous YabR and YugI proteins possess an RNA-binding S1 domain. Both proteins interact with the small subunit of the ribosome (O’Reilly et al., 2023) indicating their involvement in translation. This may also be the case for YrzB, which physically interacts with multiple ribosomal proteins (O’Reilly et al., 2023). The YlxR and KhpA proteins were found to bind RNA in Clostrioides difficile (Lamm-Schmidt et al., 2021). Finally, the YlbN protein is conserved in most bacteria and plant chloroplasts. The orthologous chloroplast and E. coli proteins are required for the accumulation of 23S rRNA (Yang et al., 2016) even though the molecular activity of the protein remains unknown.
Several of the highly expressed unknown proteins are likely involved in the control of metabolism. The YjlC protein is encoded in an operon with the NADH dehydrogenase, and the two proteins interact physically (O’Reilly et al., 2023) (see Fig. 1A). It is tempting to speculate that YjlC somehow controls the activity of NAD dehydrogenase. Indeed, both proteins are required for genetic transformation (Koo et al., 2017) indicating that they perform a joint function. The YlaN protein interacts with the key regulator of iron homeostasis (O’Reilly et al., 2023), and the normally essential gene becomes dispensable at high iron concentrations (Peters et al., 2016). Thus, YlaN may control iron homeostasis via Fur. The paralogous YqhY and YloU proteins are encoded in operons with genes required for complementary aspects of fatty acid acquisition, either biosynthesis or fatty acid phosphorylation. TheyqhY gene is quasi-essential and the cells respond to its inactivation with the accumulation of suppressor mutations in the subunits of acetyl-CoA carboxylase (Tödter et al., 2017). Thus, these two proteins may control different aspects of lipid biosynthesis. The strength of initial protein-protein interaction information is demonstrated by the YneR protein which interacts with the PdhA and PdhB subunits of pyruvate dehydrogenase. Targeted experimental studies revealed that YneR acts as an inhibitor of pyruvate dehydrogenase activity. The prediction of the YneR-PdhA-PdhB complex structure using the power of artificial intelligence suggested that YneR protrudes into the substrate binding site of pyruvate dehydrogenase, thus suggesting a mechanism for inhibition. Indeed, site-directed mutagenesis based on the predicted complex structure verified this mechanism (O’Reilly et al., 2023). This example shows the power of association analyses.
Another interesting group of highly expressed unknown proteins consists of rather small proteins in the range of 47 to 54 amino acids that are encoded in the 5’ regions of highly expressed genes and that are associated to an RNA element that is transcribed in the same orientation. Moreover, the occurrence of these proteins is limited toB. subtilis and very close relatives in the genus Bacillus(see Table 1). All these features are reminiscent of regulatory elements that are involved in mechanisms similar to attenuation. Actually, the BrmB leader peptide of brmCD operon shares all properties with respect to protein size, linkage to an RNA element, and occurrence only in B. subtilis (Reilman et al., 2014).