Sequencing analysis of NTD samples
The WGS data were processed using standard pipelines, as described in the Broad Institute’s GATK Best Practices (Van der Auwera et al., 2013). Reads were aligned to the hg38 reference provided as part of the GATK Bundle using BWA (Li & Durbin, 2009). Variant calling was performed with GATK4 (Poplin et al., 2018) and joint genotyping was carried out on the whole cohort, followed by Variant Quality Score Recalibration (VQSR). Quality control (following standard practices such as obtaining sequencing metrics, per sample missing rate and level of heterozygosity), was done to check for DNA contamination and identify outliers, removing those samples with poor quality. Per-variant quality was also assessed and only variants with a “PASS” in the filter column were retained and annotated utilizing Ensembl Variant Effect Predictor (VEP) v.95 (McLaren et al., 2016). GnomAD (https://gnomad.broadinstitute.org/ ) database v2.1.1 was used as a reference to determine whether the variant is novel (allele frequency (AF) = 0) or is rare (AF < 0.001). Pathogenic effect prediction of all missense variants were performed using the online program SIFT (Sorting Intolerant From Tolerant;https://sift.bii.a‐star.edu.sg ). All parameters were set as per the software’s default settings. The localization of the variants in their protein domains was assessed by Uniprot (http://www.uniprot.org/ ). Gene lollipop structure was plotted using Lollipops program (Jay & Brouwer, 2016).
All eight CIC variants passed Bamfile checking (IGV) and were then validated by Sanger Sequencing. Variants lollipop plot was generated using the Lollipops software (Rothenberg et al., 2004).