Kotov et al. (2017) and Forbush et al. (2018) tested and replicated a joint hierarchical factor model with the EPSI and IDAS-II in samples of participants with a DSM-5 ED. In the sample used for the present analysis (from Forbush et al. 2018) recovery outcomes were assessed twice after an initial baseline session: (a) at a 6-month follow-up, and (b) a 12-month follow-up.
Dr. Song used the multidimensional IRT implemented in the CAT app to estimate 3 Theta values for the EPSI in this sample. The goal of this analysis is to evaluate whether the EPSI Theta parameters are useful in predicting recovery outcomes. We plan on conducting further analyses after estimating Theta values for the IDAS-II, as well as to perform these analyses on an additional sample.
The data consist of \(N = 247\) participants, and the outcomes of interest span 4 categories (1 = Poor, 2 = Intermediate, 3 = Good, 4 = Excellent). The categories are based on criteria related to BMI and the number of binge-eating \((BE)\) and compensatory episodes \((CE)\) that occurred in the past 3 weeks. Specifically:
Given the imbalanced nature of the sample, I tested different models treating the outcomes as either polytomous or binary (collapsing the Intermediate, Good, and Excellent categories into a ‘Marginal’ category). It is important to note that more than half the data \((\sim 57\%)\) is missing for the 12-month follow-up. I focus on the polytomous models in the first section, and the binomial models in the second section. In each section, I start by presenting the model(s) fit with listwise deletion, and then re-evaluate the models after imputing missing values. I take the following approach in each analysis:
The 4 outcome categories along with their empirical frequencies at the 6-month and 12-month assessments are displayed below.
Outcome | Frequency | Percent | Frequency | Percent |
---|---|---|---|---|
Poor | 22 | 8.9% | 82 | 33.2% |
Intermediate | 132 | 53.4% | 19 | 7.7% |
Good | 6 | 2.4% | 1 | 0.4% |
Excellent | 9 | 3.6% | 4 | 1.6% |
Missing | 78 | 31.6% | 141 | 57.1% |
The categories are highly imbalanced at each time point and have a high degree of missingness. For each timepoint I use two separate analyses—a proportional odds logistic regression (designed for ordinal outcomes), and a multinomial regression. I will begin by focusing on the 6-month outcome, and then will present the same analyses for the 12-month outcome.
##### MODELS TESTED:
M1.0 <- polr(Outcome_6mo_4cat ~ 1) # Intercept-only
M1.1 <- polr(Outcome_6mo_4cat ~ Theta_1 + Theta_2 + Theta3) # Hypothesized
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## M1.0 166 247.7890
## M1.1 163 242.1718 1 vs 2 3 5.61722 0.1317933
The likelihood ratio test shows that the hypothesized model does not significantly improve prediction over the intercept model, \(\chi^2 (3,\,N = 169) = 5.62,\ p = .13\). The coefficients for the hypothesized model are below.
## Call:
## MASS::polr(formula = Outcome_6mo_4cat ~ Theta_1 + Theta_2 + Theta_3,
## data = out4)
##
## Coefficients:
## Value Std. Error t value P value
## Theta_1 -0.2761 0.2744 -1.0061 0.31585
## Theta_2 -0.1309 0.2919 -0.4483 0.65451
## Theta_3 -0.3903 0.2068 -1.8868 0.06096
##
## Intercepts:
## Value Std. Error t value P value
## Poor|Inter -2.2704 0.4777 -4.7527 0.0000
## Inter|Good 2.1176 0.4708 4.4977 0.0000
## Good|Excel 2.6790 0.5123 5.2297 0.0000
##
## Residual Deviance: 242.1718
## AIC: 254.1718
## (78 observations deleted due to missingness)
##### MODELS TESTED:
M2.0 <- multinom(Outcome_6mo_4cat ~ 1) # Intercept-only
M2.1 <- multinom(Outcome_6mo_4cat ~ Theta_1 + Theta_2 + Theta_3) # Hypothesized
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## M2.0 504 247.7890
## M2.1 495 226.1111 1 vs 2 9 21.6779 0.009957828
The LRT showed a significant improvement in prediction for the hypothesized model over the intercept model, \(\chi^2 (9,\,N = 169) = 21.68,\ p = .01\). As shown below, the only significant coefficients are those for Theta_1 and Theta_3 when comparing the odds of having a ‘Good’ vs. ‘Poor’ outcome. Higher values of either Theta are associated with significantly lower odds of having a ‘Good’ vs. a ‘Poor’ outcome.
## Call:
## nnet::multinom(formula = Outcome_6mo_4cat ~ Theta_1 + Theta_2 +
## Theta_3, data = out4)
##
## Coefficients:
## (Intercept) Theta_1 Theta_2 Theta_3
## Inter 1.1985228 0.3142500 -0.5235061 -0.05253081
## Good -0.9425940 -2.1185792 -0.6357543 -1.89772162
## Excel -0.3720806 -0.2205307 0.1693505 -0.73846985
##
## Std. Errors:
## (Intercept) Theta_1 Theta_2 Theta_3
## Inter 0.563319 0.3843816 0.3847212 0.2605811
## Good 1.066920 0.8473508 0.7936430 0.8593512
## Excel 0.914865 0.6692868 0.6745361 0.4998456
##
## Residual Deviance: 226.1111
## AIC: 250.1111
## P values:
## (Intercept) Theta_1 Theta_2 Theta_3
## Inter 0.03336949 0.41361588 0.1735954 0.84023645
## Good 0.37698143 0.01241097 0.4230979 0.02722204
## Excel 0.68422431 0.74177701 0.8017660 0.13956957
In this model, the ordered nature of the outcome is ignored, and ‘Poor’ is treated as the reference group. So the intercepts represent the odds of being in the associated category over being in the ‘Poor’ category. The coefficients for the predictors then represent how those odds are affected by the different Theta estimates.
Finally, we can compare the proportional odds model with the multinomial model in order to determine which better describes the data. Given that the true outcome is ordered, we should expect the proportional odds model to fit as well as or better than the more complex multinomial model.
Model | LL | df | Chisq | df_delta | Pvalue | |
---|---|---|---|---|---|---|
polr | M1.1 | -121.09 | 6 | |||
multinom | M2.1 | -113.06 | 12 | 16.06 | 6 | 0.013 |
The multinomial model fits significantly better than the proportional odds, \(\chi^2 (6,\,N = 169) = 16.06,\ p = .013\). This means that the data violate the proportional odds assumption, which we would expect to be true given the ordered nature of the categories.
There are \(N = 169\) complete cases for the 6-month outcome, and \(N = 247\) possible cases. So, I imputed 100 complete datasets using the mice
package (multivariate imputation via chained equations). I then fit the above models to each imputed dataset. Lastly, I average the results and investigate how the models that were fit across imputations perform against their respective intercept models.
First, we can visualize how well the imputed data distributions match up with the original data distribution.
Overall, these look pretty good. The imputed data are more balanced than the original sample, although still reflect the general pattern observed. Next, I fit the proportional odds model, as well as the multinomial model—with both the hypothesized and intercept-only models—for each imputed dataset. I then conducted a LRT for each imputed dataset and collected the \(p\)-values to construct distributions associated with the tests for each model.
For the proportional odds model (Left), the mean of the \(p\)-values is .099, and the median is .079. Given that these statistics are based on 100 imputed datasets, this indicates that the Theta values may have marginally better predictive accuracy than the intercept (null) model. Visual analysis of the distribution, however, indicates that this is not a substantial effect.
For the multinomial model (Right), we see stronger support for the predictive power of the hypothesized model. The mean of the \(p\)-values here is .019, and the median is .006. These results indicate that the multinomial version of the hypothesized model appears to outperform the null model. Visual analysis also shows that these tests clearly lean in favor of the hypothesized model.
The mice
package does not support pooling results for the proportional odds model. However, it does support this for multinomial models. So, lets look at the pooled coefficients:
## Multinomial model: Pooled coefficients
## estimate std.error statistic df p.value
## Inter:(Intercept) 0.97721742 0.4329942 2.25688316 185.23543 0.02502117 *
## Inter:Theta_1 0.12310609 0.2883759 0.42689457 153.35180 0.66988348
## Inter:Theta_2 -0.40790905 0.3059978 -1.33304586 204.38641 0.18392821
## Inter:Theta_3 -0.14096220 0.2019404 -0.69803851 214.90256 0.48590771
## Good:(Intercept) 0.10747849 0.6901096 0.15574119 157.41378 0.87638322
## Good:Theta_1 -1.26065645 0.5625478 -2.24097651 119.01244 0.02605023 *
## Good:Theta_2 -0.01276160 0.5657221 -0.02255807 167.10008 0.98202372
## Good:Theta_3 -0.72273995 0.4037870 -1.78990386 166.44542 0.07487725
## Excel:(Intercept) -0.22940098 0.6280801 -0.36524158 141.99838 0.71529029
## Excel:Theta_1 0.06904336 0.4534485 0.15226286 97.12986 0.87912247
## Excel:Theta_2 -0.10395184 0.4299801 -0.24175964 178.86249 0.80919695
## Excel:Theta_3 -0.46946998 0.2996187 -1.56689142 189.56528 0.11861135
I’m not clear on how to obtain a log-likelihood value for pooled coefficients, and so instead see this more as a way of getting relatively stable coefficients in situations where the model has already been selected and the data are imputed in a way that reflects the observed distribution of the outcome.
We can see that the effect of Theta_1
on the odds of being in the ‘Good’ vs. ‘Poor’ category is still significant \((b = -1.26,\ SE = .56,\ p = .026)\), while the effect of Theta_3
is now marginal \((b = -.72,\ SE = .40,\ p = .075)\).
##### MODELS TESTED:
M3.0 <- polr(Outcome_12mo_4cat ~ 1) # Intercept-only
M3.1 <- polr(Outcome_12mo_4cat ~ Theta_1 + Theta_2 + Theta_3) # Hypothesized
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## M3.0 103 142.9681
## M3.1 100 140.1748 1 vs 2 3 2.793269 0.4246092
The LRT showed no significant improvement in model fit for the hypothesized model over the intercept model, \(\chi^2 (3,\,N = 106) = 2.79,\ p = .42\). Coefficients for the hypothesized model are displayed below.
## Call:
## MASS::polr(formula = Outcome_12mo_4cat ~ Theta_1 + Theta_2 +
## Theta_3, data = out4)
##
## Coefficients:
## Value Std. Error t value P value
## Theta_1 -0.1969 0.3201 -0.61524 0.5398
## Theta_2 -0.5140 0.3880 -1.32491 0.1882
## Theta_3 -0.0149 0.2888 -0.05158 0.9590
##
## Intercepts:
## Value Std. Error t value P value
## Poor|Inter 1.5374 0.5656 2.7181 0.0077
## Inter|Good 3.3377 0.6928 4.8175 0.0000
## Good|Excel 3.5756 0.7299 4.8987 0.0000
##
## Residual Deviance: 140.1748
## AIC: 152.1748
## (141 observations deleted due to missingness)
##### MODELS TESTED:
M4.0 <- multinom(Outcome_12mo_4cat ~ 1) # Intercept-only
M4.1 <- multinom(Outcome_12mo_4cat ~ Theta_1 + Theta_2 + Theta_3) # Hypothesized
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## M4.0 315 142.9681
## M4.1 306 133.9391 1 vs 2 9 9.029035 0.4345983
No significant improvement of the hypothesized model over the intercept-only model, \(\chi^2 (9,\,N = 106) = 9.03,\ p = .43\). Model coefficients displayed below.
## Call:
## nnet::multinom(formula = Outcome_12mo_4cat ~ Theta_1 + Theta_2 +
## Theta_3, data = out4)
##
## Coefficients:
## (Intercept) Theta_1 Theta_2 Theta_3
## Inter -1.388111 -0.1613010 -0.1047764 -0.1497858
## Good -5.122783 -1.6962427 -0.5057093 -3.2037231
## Excel -5.153626 -0.1613949 -1.6678931 0.7241565
##
## Std. Errors:
## (Intercept) Theta_1 Theta_2 Theta_3
## Inter 0.6061414 0.3563913 0.4281538 0.3260344
## Good 2.8298118 1.9744594 1.4683185 2.6090278
## Excel 1.6311602 0.7023295 0.8192311 0.6130741
##
## Residual Deviance: 133.9391
## AIC: 157.9391
## P values:
## (Intercept) Theta_1 Theta_2 Theta_3
## Inter 0.022016771 0.6508403 0.80667584 0.6459347
## Good 0.070250657 0.3902896 0.73053506 0.2194704
## Excel 0.001580481 0.8182477 0.04175788 0.2375276
Because neither model performed significantly better than the intercept models, we will not bother investigating them further.
With the 12-month outcome, more than half of the data are missing. In general, imputing these data and interpreting the results is inappropriate. However, I have done this here simply for the sake of completeness.
This clearly shows that the imputations poorly represent the original sample. We even see evidence that the imputation procedure oscillated between two relatively stable—yet distinct—solutions. This is almost certainly due to the fact that more than half the data \((\sim 57\%)\) are missing for this measure, and thus precludes interpretation of any models fit to these data. Nevertheless, below we see the distributions of \(p\)-values associated with the LRTs conducted for each type of model.
Although this seems to provide some support for the idea that the EPSI Thetas have some predictive accuracy for the 12-month outcome (at least for the multinomial model), we must keep in mind the fact that:
The main point of interest here is that we do seem to be observing that the proportional odds assumption is typically violated in these data, as the multinomial model generally appears to outperfom the alternative.
Although we really should not interpret the models for the 12-month outcome, here are the pooled coefficients for the multinomial models fit to the imputed data.
## Multinomial model: Pooled coefficients
## estimate std.error statistic df p.value
## Inter:(Intercept) -0.19589823 0.3813621 -0.5136803 194.0271 0.6080507906
## Inter:Theta_1 -0.44706916 0.2529781 -1.7672245 195.3270 0.0787379115
## Inter:Theta_2 0.29415931 0.2934737 1.0023361 183.2178 0.3174100872
## Inter:Theta_3 0.03794014 0.1954688 0.1940982 197.1280 0.8462988982
## Good:(Intercept) -0.90785691 0.5361215 -1.6933791 143.1977 0.0919629023
## Good:Theta_1 0.10147487 0.3325106 0.3051779 140.5557 0.7605526810
## Good:Theta_2 1.05656178 0.4475622 2.3607039 136.9616 0.0192164204 *
## Good:Theta_3 -0.08089795 0.3100929 -0.2608829 123.3224 0.7944551332
## Excel:(Intercept) -1.94889999 0.5717767 -3.4084985 141.6545 0.0007919806 ***
## Excel:Theta_1 0.07661097 0.2798667 0.2737409 175.7985 0.7845703398
## Excel:Theta_2 -0.29267175 0.3880430 -0.7542251 139.8452 0.4516140961
## Excel:Theta_3 0.48171837 0.2596626 1.8551701 149.8833 0.0650650782
Given the imbalanced distribution of classes at each measurement occasion, we may benefit from investigating these variables after collapsing the responses into two categories: 0 = Poor, and 1 = Marginal (i.e., Intermediate + Good + Excellent). The following table shows the frequencies for this coding scheme:
Outcome | Frequency | Percent | Frequency | Percent |
---|---|---|---|---|
Poor | 22 | 8.9% | 82 | 33.2% |
Marginal | 147 | 59.5% | 24 | 9.7% |
Missing | 78 | 31.6% | 141 | 57.1% |
In each section, I take the same basic approach as I did for the polytomous models, except that here I only test a logistic regression model given that we are using a binary outcome.
## Analysis of Deviance Table
##
## Model 1: Outcome_6mo_2cat ~ 1
## Model 2: Outcome_6mo_2cat ~ Theta_1 + Theta_2 + Theta_3
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 168 130.71
## 2 165 127.92 3 2.7886 0.4254
The LRT shows no significant improvement in prediction for the hypothesized model over the intercept model, \(\chi^2 (3,\,N = 169) = 2.79,\ p = .43\). Coefficients for the hypothesized model are displayed below.
##
## Call:
## glm(formula = Outcome_6mo_2cat ~ Theta_1 + Theta_2 + Theta_3,
## family = binomial, data = out2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2387 0.4278 0.4967 0.5608 0.8565
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.4787 0.5366 2.756 0.00586 **
## Theta_1 0.1979 0.3636 0.544 0.58616
## Theta_2 -0.4757 0.3797 -1.253 0.21031
## Theta_3 -0.1378 0.2536 -0.543 0.58691
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.71 on 168 degrees of freedom
## Residual deviance: 127.92 on 165 degrees of freedom
## (78 observations deleted due to missingness)
## AIC: 135.92
##
## Number of Fisher Scoring iterations: 5
The mice
package was again used to create 100 imputations of the 6-month outcome.
As in the polytomous analysis with the 6-month outcome, we see that the imputed data seem to effectively represent the original data for the outcome. Similar to the previous section, next I fit pairs of models to each imputed dataset (intercept-only vs. hypothesized), and collected the \(p\)-values from the LRT associated with each dataset.
The mean of the \(p\)-values is .482, and the median is .439. These results support the conclusion that the Theta values do not significantly improve predictive accuracy over the null (intercept-only) model. The pooled model coefficients are displayed below.
## Logistic model: Pooled coefficients
## estimate std.error statistic df p.value
## (Intercept) 0.848356730 0.4169638 2.03460527 117.91573 0.04376876 *
## Theta_1 0.094836632 0.2782679 0.34081052 98.30709 0.73375454
## Theta_2 -0.254793843 0.2938960 -0.86695246 140.71491 0.38744399
## Theta_3 0.008030057 0.2059813 0.03898441 134.54999 0.96895808
## Analysis of Deviance Table
##
## Model 1: Outcome_12mo_2cat ~ 1
## Model 2: Outcome_12mo_2cat ~ Theta_1 + Theta_2 + Theta_3
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 105 113.40
## 2 102 111.31 3 2.0903 0.5539
The LRT shows no significant improvement in prediction for the hypothesized model, \(\chi^2 (3,\,N = 106) = 2.09,\ p = .55\). Coefficients for the hypothesized model are displayed below.
##
## Call:
## glm(formula = Outcome_12mo_2cat ~ Theta_1 + Theta_2 + Theta_3,
## family = binomial, data = out2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9827 -0.7420 -0.6532 -0.5039 1.9402
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.43797 0.56428 -2.548 0.0108 *
## Theta_1 -0.19164 0.32482 -0.590 0.5552
## Theta_2 -0.40193 0.37682 -1.067 0.2861
## Theta_3 -0.04901 0.28995 -0.169 0.8658
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 113.40 on 105 degrees of freedom
## Residual deviance: 111.31 on 102 degrees of freedom
## (141 observations deleted due to missingness)
## AIC: 119.31
##
## Number of Fisher Scoring iterations: 4
The mice
package was used to impute 100 datasets with complete cases of the 12-month outcome.
As with the polytomous coding, we see that the imputed distributions differ greatly from the observed distribution. In fact, the mode in the imputed data is the minimum in the observed data. This means that we cannot interpret the foregoing results as a valid analysis of this outcome.
The mean \(p\)-value for the LRTs is .353, and the median is .303. Once again, we see no evidence for the Theta values being significant predictors of the 12-month outcome. The pooled logistic regression coefficients are displayed below.
## Logistic model: Pooled coefficients
## estimate std.error statistic df p.value
## (Intercept) -0.32060588 0.4715931 -0.6798359 67.98063 0.4983908
## Theta_1 -0.12603912 0.3227456 -0.3905215 55.45056 0.6970936
## Theta_2 0.05027161 0.3053555 0.1646331 88.10392 0.8696099
## Theta_3 0.08322651 0.2178946 0.3819578 84.44816 0.7034124
Overall, analyses do not suggest that the EPSI Thetas are significant predictors of recovery outcomes in the current sample. However, it is important to note that the substantial class imbalance is a clear violation of model assumptions and thus challenges the validity and reliability of the results. In general, though:
Forbush, Kelsie T, Po-Yi Chen, Kelsey E Hagan, Danielle AN Chapa, Sara R Gould, Nicholas R Eaton, and Robert F Krueger. 2018. “A New Approach to Eating-Disorder Classification: Using Empirical Methods to Delineate Diagnostic Dimensions and Inform Care.” International Journal of Eating Disorders 51: 710–21.
Kotov, Roman, Robert F Krueger, David Watson, Thomas M Achenbach, Robert R Althoff, R Michael Bagby, Timothy A Brown, et al. 2017. “The Hierarchical Taxonomy of Psychopathology (Hitop): A Dimensional Alternative to Traditional Nosologies.” Journal of Abnormal Psychology 126: 454–77.