Haplotype analysis and genetic association study
We used in silico data generated from a previously reported fine-mapping GWAS on genetic predictors of penicillin allergy using the Illumina Immunochip array (Illumina Inc. CA, USA) that covers the HLA loci.11 We extracted from the full dataset the genotypic data corresponding to the genetic variants located in theHLA-DRB3 locus and its vicinity. The HLA-DRB3 locus is not referenced according to the GRCh38/hg38 build, but rather according to the Homo sapiens chromosome 6 genomic contig, GRCh38 reference assembly alternate locus group ALT_REF_LOCI_220,21with the following coordinates: hg38 chr6_GL000251v2_alt:3,934,009-3,947,126. Since the genomic positions of the Illumina Immunochip array were initially reported according to the NCBI36 build, we used the Liftover tool22 from the UCSC Genome Browser database to convert the genomic position of theHLA-DRB3 locus from the GRCh38 reference assembly to the NCBI36 build (chr6:32,571,675-32,584,792). Because of the complex structure and high level of linkage disequilibrium (LD) in the HLA locus,23 we considered all the genetic variants located in the intergenic region between theHLA-DRA and HLA-DRB5 genes that included theHLA-DRB3 gene. Sample quality-control measures included: sample call rate (>90%), overall heterozygosity, and relatedness testing. We assessed cryptic relatedness using identity-by-descent analysis. Genetic variants were removed from the primary analysis if they had a call rate <90%, a significant departure from Hardy-Weinberg equilibrium (exact HWE-P < 10āˆ’4 among controls), or a minor allele frequency <5%. We performed the genetic association analysis according to the allelic model. We completed the haplotype association analysis using a moving window with a fixed width of 4 markers. We performed LD-pairwise analysis on all adjacent pairs of genetic variants using a matrix output for both the expectation-maximization (EM) algorithm and the composite-haplotype method.24,25 We used D’ values in the LD plots. We estimated haplotype frequencies using the EM algorithm with maximum EM iterations of 50 and an EM convergence tolerance of 0.0001.26 We compared haplotype frequencies using the Chi-squared test and reported the corresponding OR, the 95% confidence interval, and the associated P -value for each haplotype. Given the exploratory nature of our analysis, we considered a genomic region as potentially relevant if it encompassed genetic variants that were significantly associated with the risk of delayed hypersensitivity to penicillins in both per-variant and per-haplotype association analyses. All statistical analyses were performed using the SNP & Variation Suite (Golden Helix, Inc., Bozeman, MT, USA).