Genotyping, imputation and quality control
The UK Biobank cohort consists of about 500,000 participants, with
487,409 individuals having genetic data. Genotyping, quality control and
imputation were performed by the UK Biobank 1518. In brief, the Affymetrix UK BiLEVE Axiom and the
Affymetrix UK Biobank Axiom arrays (Santa Clara, CA, USA) were used for
genotyping. The two have over 95% of content in common. Imputation was
carried out by IMPUTE4 in chunks of approximately 50,000 imputed markers
with a 250 kb buffer region. Imputation reference panel included the
1000 Genomes phase 3 dataset, the merged UK10K and 1000 Genomes phase 3
reference panels and the Haplotype Reference Consortium (HRC) data.
Quality control (QC) consisted of two parts, named sample-based QC and
marker-based QC, and they were performed using PLINK v1.9 and R v3.3.1.
KING software was used to generate unrelated subjects. It includes a
rapid algorithm for relationship inference that allows the presence of
unknown population substructure 19. Missing rate and
heterozygosity were used to identify the poor quality samples in the
sample-based QC. Statistical tests of batch effects, plate effects,
Hardy–Weinberg equilibrium, sex effects, array effects, and discordance
across control replicates were performed to identify poor quality
markers in marker-based QC, checking for consistency across experimental
factors. Detailed information of the genotyping, imputation and quality
control could be found in the published study 18.