3.2.4 File Manipulation
Various file formats have been introduced with the development of
different DNA/RNA sequencing technologies. While there are many
different biological file formats related to NGS analyses (or to store
and manipulate), FASTA/Q files are most commonly encountered in the
bioinformatics community. This is due to their flexibility: FASTA/Q
files can be read, mapped and indexed by several different software
packages to generate SAM/BAM, GFF/GTF, VCF, and more. Using a fai index
file in conjunction with a FASTA/Q file containing reference sequences
enables efficient access to arbitrary regions within those reference
sequences and extracts subsequences from the indexed reference sequence
(Danecek et al. 2021; Quinlan & Hall 2010).
Like other modules, the web-based Galaxy (homepage:
https://galaxyproject.org, main public server:
https://usegalaxy.org, Australia: https://usegalaxy.org.au/)
and command-line tools, Samtools and BCFtools (Danecek et al., 2021) and
BEDTools (Quinlan & Hall, 2010), offer a range of NGS data file
manipulation capabilities, but its usage can be challenging for
biologists due to lack of computer language literacy and internet
dependence. To enhance and extend the flexibility and convenience, we
present easyfm , a free single GUI for NGS file manipulation
(mainly for FASTA files) (Figure 5). Since users can control everything
with a simple mouse click on a desktop, the tools available in theeasyfm would be a convenient way to teach bioinformatics/data
analysis, and to quickly analyse results without being hampered by
command line tools and HPC Secure Shell (SSH) connections.
Users can import any FASTA/Q files to index and extract the indexed ID
with its sequence by double-clicking, matching Prefix ID and selecting a
provided text file (Figure 5A). Even the FASTQ file can be converted to
the FASTA file and the given FASTA file change its direction via Reverse
Complement and Reverse (Figure 5B and 5C). For wide applications,easyfm File Manipulation also allows users to easily manipulate
(including filtering [IDs, features and strand] and extracting
sequence regions) and consolidate from GFF and GTF files if its
corresponding reference genome/transcriptome sequences are present
(Figure 5D). To enhance user-friendliness, users can extract a given
sequence as a FASTA file with extra flanking regions for both directions
by entering the desired sequence length (numeric numbers). Along with
existing tools (Danecek et al., 2021; Quinlan & Hall, 2010),easyfm File Manipulation will provide a stable and modular
platform for manipulating sequence data and files to ensure high
reproducibility standards in the NGS era.