Figure legends
Figure 1. Item response model for MDS-UPDRS Part III. Left: Item scores relate to underlying severity, which mirrors the sum of score, through Item Characteristic Curves (ICCs). Upper right: the position and steepness of ICCs reflect an item’s difficulty and ability to differentiate patient severity, respectively. The blue, pink, green and red curves describe the probabilities of having a score of not lower than 1, 2, 3 and 4, respectively. Lower right: The blue, pink, green, red and yellow Category Characteristic Curves (CCCs) describe the probabilities of having a score of 0, 1, 2, 3 and 4, respectively.
Figure 2. Data pattern and model evaluation. The pattern of the observed Sum-of-Score data (upper left) was reproduced by modeled Symptom Severity (upper right). Model-estimated Category Characteristic Curves (lines) reflected the distribution of observed categories (circles) for each item over the range of symptom severity (lower left). The proportion of the simulated scores were compared with the observed scores (lower right).
Figure 3. Item informativeness. Item information over the whole spectrum of symptom severity shows some items are far more informative than others. The color-coded areas represent the items, from bottom to top, in the order of decreasing information.
Figure 4. Trial probability of success. Upper: Probability of trial success for detecting a hypothetical drug’s ability to slow down disease progression was higher when data were analyzed using Symptom Severity (brown) than using the Sum of Scores (green), where solid and dashed lines reflect analyses including all items and only non-tremor items, respectively. Lower: Comparison of power for detecting drug effect (green: 0.1; blue: 0.5) and overall probability of trial success (brown) for detecting a range of potential drug effects in a one-year trial.
Figure 5. Visual predictive check for the longitudinal item-response model. The time course of the distribution of the observed sum of scores was well reproduced by the longitudinal IRT model (dots: observations; green lines: 5%, 50% and 95% quantiles of the observations; red line: predicted time course of sum of score for a typical patient; bands: 95% confidence intervals of model simulated corresponding quantiles).
Figure 6. The model accurately simulated the time course of the observed proportion of each score for each item. The lines are the proportion of the observed scores of 0 to 5. The bands are the 95% confidence intervals of the model simulation.