High-Dimensional Biomarker Panel Joint Distribution Simulations
We evaluated whether GANs could be used to generate the joint distribution of multiple biomarkers by using the 14 diabetes-relevant biomarkers.
Because the 14-dimensional joint distribution is not amenable to visualization, we used three different multi-dimensional visualization approaches, t-SNE (Figure 3A), UMAP (Figure 3B), and PCA (Figure 3C) to generate 2-dimensional projections of the test and GAN-generated distributions. The projected data for the GAN-generated distribution (teal circles) was well dispersed in the test data distribution (salmon circles) for all three approaches. This indicates that GANs are a promising approach for generating high dimensional biomarker distributions.
To further assess the performance of GANs, we visualized the univariate and bivariate marginal distributions from the high dimensional joint distribution (Figure 3D) using pairs panel plots, which summarize the univariate density along the diagonal, the bivariate scatter plots in the lower triangular region and the Spearman correlation coefficients in the upper triangular region. The pairs panel plots for scaled log-transformed levels of seven biomarkers: urine albumin, urine creatinine, fasting glucose, insulin, body mass index, glycohemoglobin and triglyceride are shown in Figure 3D. The univariate densities (see diagonal in Figure 3D) for the GAN-generated data for all seven biomarkers overlapped extensively with the test data density and the individual density curves were difficult to distinguish. The bivariate scatter plots also overlapped extensively, and the GAN-generated data points were evenly dispersed among the test data points for all 21 bivariate plots in Figure 3D.
These results show that GAN-generated distributions can be useful for modeling systems of clinical biomarkers.