1 INTRODUCTION
Several decades of biochemical research have resulted in the elucidation
of functions of a large number of proteins. Yet, we are still far from a
comprehensive understanding of the role of all proteins of any single
organism. This is illustrated by the artificial minimal organismMycoplasma mycoides JCVI-syn3A. With only 452 protein-coding
genes, this organism currently defines the lower limit of the protein
complement for an independently viable bacterium (Hutchison et al.,
2016; Breuer et al., 2019). Yet, about one third of these proteins still
have no known function, indicating major gaps in our knowledge even for
the most simple cells (Hutchison et al., 2016; Pedreira et al., 2022 a).
These gaps are caused by a focus on a limited set of intensively studied
proteins for which more and more knowledge accumulates. On the other
hand, the biological function of a significant number of proteins
remains not at all or only poorly understood.
Recently, it has been proposed to close this gap of knowledge by
launching the Understudied Proteins Initiative (Kustatscher et al., 2022
a, b). The functional analysis of unknown proteins can be highly
laborious and poorly rewarding. This is exemplified by a European/
Japanese initiative to functionally identify all unknown proteins of the
Gram-positive model bacterium Bacillus subtilis after the
completion of the genome sequence. While tremendous human and financial
resources went into this project, functions have not been identified for
more than a handful of so far unknown proteins. This results from the
repeated investigation of a defined set of phenotypic analyses under a
defined set of conditions. Typically, these phenotypes and conditions
have been studied intensively before, so that only little new knowledge
can be expected. As a conclusion, at least a minimal amount of
molecular/ functional annotation is required as the starting point to
identify the function of so far unknown proteins. This minimal knowledge
could cover expression over a wide range of conditions, the similarity
of phenotypes to known phenotypes, or the association of unknown
proteins with proteins of known function, with RNA or other
biomolecules. Finally, the identification of gaps in our knowledge and
the goal-driven investigation of these functions may help to unravel the
functions of poorly studied proteins. Another prerequisite for closing
the annotation gap is the integration of all available information in
intuitively accessible databases.
We are interested in the model organism B. subtilis . As a model
organism for differentiation and workhorse in biotechnology, B.
subtilis is one of the most intensively studied organisms. However, the
function of about 25% of the proteins (about 1,000 proteins) encoded by
the B. subtilis genome is still unknown or only very poorly
understood (Pedreira et al., 2022). In this review, we give an overview
on the strategies used to get initial minimal annotation for so far
unknown proteins, we present a set of highly expressed unknown proteins
that should be studied with highest priority, and we define and discuss
fields of research that still have many open questions, i.e .,
RNA-binding proteins, amino acid transport, and the control of metabolic
homeostasis.