Population structure analysis
Principal Component Analysis (PCA) was performed using PLINK software version 2.0 . First, only biallelic SNPs were selected, and linkage disequilibrium (LD) pruning was performed on the vcf file encompassing all variants in the core genome using PLINK, followed by PCA analysis using the first 20 principal components. PCA results were plotted in R using the ggplot2 library. Starting from the LD pruned dataset, admixture analysis was performed with the ADMIXTURE software version 1.3.0 . The optimal number of populations was determined by running ADMIXTURE for a range of K-values (i.e. , number of populations) from 2 to 50. This involved a 10-fold cross-validation, and selection of the K-value for the number of populations with the lowest cross-validation error. Phylogenetic trees were constructed by first converting the vcf file to PHYLIP format using the vcf2phylip.py script . Phylogenetic trees were constructed using RAxML, with P. knowlesi defined as outgroup, using the GTR+G evolutionary model and using a bootstrapping value of 100 . The phylogenetic tree was visualized using the ggtree library in R. Nucleotide diversity was determined by sliding across the genome in 500-bp windows over all LD-pruned SNPs of the core genome using Vcftools . The multiplicity of infections was calculated using the getFws command as implemented in the moimix package in R .