RNA-seq data analysis
For the RNA-seq datasets (GSE147507, GSE123938, GSE3963412), count data was uploaded in online MeV software and normalised using the DESeq tool, while for GSE150316, DESeq normalised data was already available. Differential expression of genes between control and target groups was analysed using the Limma pipeline. The output provided by the Limma contained a list of statistically differentially expressed genes (p<=0.05). This list of genes was uploaded in the online Network Analyst software and the output containing the list of significantly enriched pathways (p<=0.05) was downloaded (the list of pathways for each dataset is given below), along with the list of genes implicated in each pathway. Furthermore, for each of the enriched pathways we carefully looked into the expression patterns of each member genes and based upon the directionality of the key enzymes, regulatory proteins, neighbouring genes and published studies, the upregulation/downregulation of the respective pathway was deciphered.
For the GSA id PRJCA002326 dataset, since only raw read data was available, Thefastq reads were mapped to hg38 using STAR (v2.27.2b) to create the sample-wise bam files. The bam files were then processed using Rsamtools, Rsubread and Genomic Alignments R packages to create the count table. The count data was subsequently analysed using the MeV software and NetworkAnalyst software as described above.