Conditional GANs for Biomarker Distribution Simulations for Under-represented Groups
Dataset: Conditional GAN analyses were conducted with the same 14-biomarker diabetes-relevant set and test-training methods of the previous High-Dimensional Biomarker Joint Distribution Simulations section.
Data Pre-processing: The race variable was obtained from theRIDRETH1 variable in the NHANES datasets. The Non-Hispanic Black group was categorized as Black, the Mexican American and Other Hispanic groups were categorized as Hispanic, the Non-Hispanic White group was categorized as White, and the Other Race-Including Multi-Racial was categorized as Other.
GAN Architecture: The generator and discriminator architectures were identical to that used for High-Dimensional Biomarker Panel Joint Distribution Simulations. However, the derived race/ethnicity categories were encoded as one-hot encoded vectors and appended with the biomarker input.
The model was trained for 1000 epochs with batch size of 300 and five discriminator steps.
Data Analysis: The high dimensional distributions were visualized using t-SNE and UMAP methods and assessments of the univariate distribution of the GAN-generated distribution vs. test data distribution were conducted with box plots.