Genotyping, imputation and quality control
The UK Biobank cohort consists of about 500,000 participants, with 487,409 individuals having genetic data. Genotyping, quality control and imputation were performed by the UK Biobank 1518. In brief, the Affymetrix UK BiLEVE Axiom and the Affymetrix UK Biobank Axiom arrays (Santa Clara, CA, USA) were used for genotyping. The two have over 95% of content in common. Imputation was carried out by IMPUTE4 in chunks of approximately 50,000 imputed markers with a 250 kb buffer region. Imputation reference panel included the 1000 Genomes phase 3 dataset, the merged UK10K and 1000 Genomes phase 3 reference panels and the Haplotype Reference Consortium (HRC) data. Quality control (QC) consisted of two parts, named sample-based QC and marker-based QC, and they were performed using PLINK v1.9 and R v3.3.1. KING software was used to generate unrelated subjects. It includes a rapid algorithm for relationship inference that allows the presence of unknown population substructure 19. Missing rate and heterozygosity were used to identify the poor quality samples in the sample-based QC. Statistical tests of batch effects, plate effects, Hardy–Weinberg equilibrium, sex effects, array effects, and discordance across control replicates were performed to identify poor quality markers in marker-based QC, checking for consistency across experimental factors. Detailed information of the genotyping, imputation and quality control could be found in the published study 18.