Curating existing annotations
The BITACORA workflow has three main steps (Fig. 1). The first step consists in the identification of all putative homologs of the FPDB sequences from the focal gene family that are already present in the input GFF file, and the curation of their gene models (referred hereinafter as b-curated (bitacora-curated) gene models or proteins). Specifically, the pipeline launches BLASTP and HMMER searches (Altschul, 1997; Eddy, 2011) against the proteins predicted from the features in the input GFF using the FPDB protein sequences and HMM profiles as queries; the resulting alignments are filtered for quality (i.e. BLASTP hits covering at least two-thirds of the length of query sequences or including at least the 80% of the complete protein used as a subject are retained). The results from both searches are combined into a single integrated result for every single protein (gene model). Then, BITACORA trims the original models based in these combined results (retaining only the aligned sequence) and reports new gene coordinates (b-curated models) in a new updated GFF (uGFF), fixing for example all chimeric annotations. Besides, the proteins encoded by these b-curated models are incorporated to the FPDB (updated FPDB or uFPDB), to be used in an additional search round.