Phylogenetic analysis
For general phylogenetic purposes, we compared MIRU-VNTR types against
those (> 8,000 isolates) in general global MIRU-VNTR
datasets and specific Beijing databases
(Merker et al., 2015;
Mestre et al., 2011).
For the phylogenetic analysis, WGS reads were aligned to the M.
tuberculosis H37Rv (NC_000962.3) genome sequence with Bowtie 2
(Langmead & Salzberg, 2012). SAMtools
v.0.1.18 and FreeBayes v.1.1.0 were used for variant calling
(Koboldt et al., 2012;
Marth, 2012). For FreeBayes, SNPs with a
minimum mapping quality of 20, minimum coverage of 10 and alternate
fraction of 0.9 were taken.
SNPs from our WGS data were compared against global NGS
collections (> 9,000 downloaded MTB genomes)
(Shitikov et al., 2017). For more
specific analysis within the Beijing lineage, we compared WGS data
against 200 strains corresponding to the L2.2.5/Asian African 3 subgroup
(Supplementary Table). The phylogenetic tree was built based on overall
SNPs after excluding repetitive, mobile elements, PE-PPE-PE_RGRS,
drug-resistance associated genes, and artifact SNPs linked to indels
using RAxML v8.2.11 (Stamatakis, 2014)
under a GTR-CAT model with ascertainment bias correction. A subset of
genomes of the L2.2.3/Asia Ancestral sublineage was taken as an
outgroup. A smaller subset (clade including the Panama and Vietnamese
isolates) was used for aligning and identifying the 858 SNP positions.
Bayesian analysis of molecular sequences (BEAST) of three independent
chains of 100 mio iterations (using evolutionary models: GTR, molecular
clock, UCLD, population mode, and GMFR, Skyride) was performed. Three
chains converged after about two mio iterations (considered burn-in and
removed) and were combined into a single dataset that was then used to
construct the maximum clade credibility tree.