Background

Kotov et al. (2017) and Forbush et al. (2018) tested and replicated a joint hierarchical factor model with the EPSI and IDAS-II in samples of participants with a DSM-5 ED. In the sample used for the present analysis (from Forbush et al. 2018) recovery outcomes were assessed twice after an initial baseline session: (a) at a 6-month follow-up, and (b) a 12-month follow-up.

Dr. Song used the multidimensional IRT implemented in the CAT app to estimate 3 Theta values for the EPSI in this sample. The goal of this analysis is to evaluate whether the EPSI Theta parameters are useful in predicting recovery outcomes. We plan on conducting further analyses after estimating Theta values for the IDAS-II, as well as to perform these analyses on an additional sample.

The data consist of \(N = 247\) participants, and the outcomes of interest span 4 categories (1 = Poor, 2 = Intermediate, 3 = Good, 4 = Excellent). The categories are based on criteria related to BMI and the number of binge-eating \((BE)\) and compensatory episodes \((CE)\) that occurred in the past 3 weeks. Specifically:

Given the imbalanced nature of the sample, I tested different models treating the outcomes as either polytomous or binary (collapsing the Intermediate, Good, and Excellent categories into a ‘Marginal’ category). It is important to note that more than half the data \((\sim 57\%)\) is missing for the 12-month follow-up. I focus on the polytomous models in the first section, and the binomial models in the second section. In each section, I start by presenting the model(s) fit with listwise deletion, and then re-evaluate the models after imputing missing values. I take the following approach in each analysis:

  1. I begin by fitting the hypothesized model (where the Theta values predict the outcome), then use a likelihood ratio test (LRT) to determine whether this model outperforms an intercept-only model.
  2. I display the coefficients for the hypothesized model, along with their associated standard errors and probability values. Interpretation is only provided if the LRT indicates that the hypothesized model predicts the outcome significantly better than the intercept model.


Polytomous Outcome

The 4 outcome categories along with their empirical frequencies at the 6-month and 12-month assessments are displayed below.


6 month
12 month
Outcome Frequency Percent Frequency Percent
Poor 22 8.9% 82 33.2%
Intermediate 132 53.4% 19 7.7%
Good 6 2.4% 1 0.4%
Excellent 9 3.6% 4 1.6%
Missing 78 31.6% 141 57.1%

The categories are highly imbalanced at each time point and have a high degree of missingness. For each timepoint I use two separate analyses—a proportional odds logistic regression (designed for ordinal outcomes), and a multinomial regression. I will begin by focusing on the 6-month outcome, and then will present the same analyses for the 12-month outcome.

6-month follow up

Listwise deletion

Proportional odds logistic regression

##### MODELS TESTED:
M1.0 <- polr(Outcome_6mo_4cat ~ 1) # Intercept-only
M1.1 <- polr(Outcome_6mo_4cat ~ Theta_1 + Theta_2 + Theta3) # Hypothesized
##      Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## M1.0       166   247.7890                                
## M1.1       163   242.1718 1 vs 2     3  5.61722 0.1317933

The likelihood ratio test shows that the hypothesized model does not significantly improve prediction over the intercept model, \(\chi^2 (3,\,N = 169) = 5.62,\ p = .13\). The coefficients for the hypothesized model are below.

## Call:
## MASS::polr(formula = Outcome_6mo_4cat ~ Theta_1 + Theta_2 + Theta_3, 
##     data = out4)
## 
## Coefficients:
##           Value Std. Error t value P value
## Theta_1 -0.2761     0.2744 -1.0061 0.31585
## Theta_2 -0.1309     0.2919 -0.4483 0.65451
## Theta_3 -0.3903     0.2068 -1.8868 0.06096
## 
## Intercepts:
##            Value   Std. Error t value P value
## Poor|Inter -2.2704  0.4777    -4.7527  0.0000
## Inter|Good  2.1176  0.4708     4.4977  0.0000
## Good|Excel  2.6790  0.5123     5.2297  0.0000
## 
## Residual Deviance: 242.1718 
## AIC: 254.1718 
## (78 observations deleted due to missingness)


Multinomial regression

##### MODELS TESTED:
M2.0 <- multinom(Outcome_6mo_4cat ~ 1) # Intercept-only
M2.1 <- multinom(Outcome_6mo_4cat ~ Theta_1 + Theta_2 + Theta_3) # Hypothesized
##      Resid. df Resid. Dev   Test    Df LR stat.     Pr(Chi)
## M2.0       504   247.7890                                  
## M2.1       495   226.1111 1 vs 2     9  21.6779 0.009957828

The LRT showed a significant improvement in prediction for the hypothesized model over the intercept model, \(\chi^2 (9,\,N = 169) = 21.68,\ p = .01\). As shown below, the only significant coefficients are those for Theta_1 and Theta_3 when comparing the odds of having a ‘Good’ vs. ‘Poor’ outcome. Higher values of either Theta are associated with significantly lower odds of having a ‘Good’ vs. a ‘Poor’ outcome.

## Call:
## nnet::multinom(formula = Outcome_6mo_4cat ~ Theta_1 + Theta_2 + 
##     Theta_3, data = out4)
## 
## Coefficients:
##       (Intercept)    Theta_1    Theta_2     Theta_3
## Inter   1.1985228  0.3142500 -0.5235061 -0.05253081
## Good   -0.9425940 -2.1185792 -0.6357543 -1.89772162
## Excel  -0.3720806 -0.2205307  0.1693505 -0.73846985
## 
## Std. Errors:
##       (Intercept)   Theta_1   Theta_2   Theta_3
## Inter    0.563319 0.3843816 0.3847212 0.2605811
## Good     1.066920 0.8473508 0.7936430 0.8593512
## Excel    0.914865 0.6692868 0.6745361 0.4998456
## 
## Residual Deviance: 226.1111 
## AIC: 250.1111
## P values:
##       (Intercept)    Theta_1   Theta_2    Theta_3
## Inter  0.03336949 0.41361588 0.1735954 0.84023645
## Good   0.37698143 0.01241097 0.4230979 0.02722204
## Excel  0.68422431 0.74177701 0.8017660 0.13956957

In this model, the ordered nature of the outcome is ignored, and ‘Poor’ is treated as the reference group. So the intercepts represent the odds of being in the associated category over being in the ‘Poor’ category. The coefficients for the predictors then represent how those odds are affected by the different Theta estimates.

Finally, we can compare the proportional odds model with the multinomial model in order to determine which better describes the data. Given that the true outcome is ordered, we should expect the proportional odds model to fit as well as or better than the more complex multinomial model.

Model LL df Chisq df_delta Pvalue
polr M1.1 -121.09 6
multinom M2.1 -113.06 12 16.06 6 0.013

The multinomial model fits significantly better than the proportional odds, \(\chi^2 (6,\,N = 169) = 16.06,\ p = .013\). This means that the data violate the proportional odds assumption, which we would expect to be true given the ordered nature of the categories.

Multiple imputation

There are \(N = 169\) complete cases for the 6-month outcome, and \(N = 247\) possible cases. So, I imputed 100 complete datasets using the mice package (multivariate imputation via chained equations). I then fit the above models to each imputed dataset. Lastly, I average the results and investigate how the models that were fit across imputations perform against their respective intercept models.

First, we can visualize how well the imputed data distributions match up with the original data distribution.

Overall, these look pretty good. The imputed data are more balanced than the original sample, although still reflect the general pattern observed. Next, I fit the proportional odds model, as well as the multinomial model—with both the hypothesized and intercept-only models—for each imputed dataset. I then conducted a LRT for each imputed dataset and collected the \(p\)-values to construct distributions associated with the tests for each model.

For the proportional odds model (Left), the mean of the \(p\)-values is .099, and the median is .079. Given that these statistics are based on 100 imputed datasets, this indicates that the Theta values may have marginally better predictive accuracy than the intercept (null) model. Visual analysis of the distribution, however, indicates that this is not a substantial effect.

For the multinomial model (Right), we see stronger support for the predictive power of the hypothesized model. The mean of the \(p\)-values here is .019, and the median is .006. These results indicate that the multinomial version of the hypothesized model appears to outperform the null model. Visual analysis also shows that these tests clearly lean in favor of the hypothesized model.

The mice package does not support pooling results for the proportional odds model. However, it does support this for multinomial models. So, lets look at the pooled coefficients:

## Multinomial model: Pooled coefficients
##                      estimate std.error   statistic        df    p.value    
## Inter:(Intercept)  0.97721742 0.4329942  2.25688316 185.23543 0.02502117 *  
## Inter:Theta_1      0.12310609 0.2883759  0.42689457 153.35180 0.66988348    
## Inter:Theta_2     -0.40790905 0.3059978 -1.33304586 204.38641 0.18392821    
## Inter:Theta_3     -0.14096220 0.2019404 -0.69803851 214.90256 0.48590771    
## Good:(Intercept)   0.10747849 0.6901096  0.15574119 157.41378 0.87638322    
## Good:Theta_1      -1.26065645 0.5625478 -2.24097651 119.01244 0.02605023 *  
## Good:Theta_2      -0.01276160 0.5657221 -0.02255807 167.10008 0.98202372    
## Good:Theta_3      -0.72273995 0.4037870 -1.78990386 166.44542 0.07487725    
## Excel:(Intercept) -0.22940098 0.6280801 -0.36524158 141.99838 0.71529029    
## Excel:Theta_1      0.06904336 0.4534485  0.15226286  97.12986 0.87912247    
## Excel:Theta_2     -0.10395184 0.4299801 -0.24175964 178.86249 0.80919695    
## Excel:Theta_3     -0.46946998 0.2996187 -1.56689142 189.56528 0.11861135

I’m not clear on how to obtain a log-likelihood value for pooled coefficients, and so instead see this more as a way of getting relatively stable coefficients in situations where the model has already been selected and the data are imputed in a way that reflects the observed distribution of the outcome.

We can see that the effect of Theta_1 on the odds of being in the ‘Good’ vs. ‘Poor’ category is still significant \((b = -1.26,\ SE = .56,\ p = .026)\), while the effect of Theta_3 is now marginal \((b = -.72,\ SE = .40,\ p = .075)\).

Conclusion

  • For the 6-month outcome, the imputations appear to effectively represent the observed distribution.
  • The analyses show that the multinomial model outperforms the null model and should be preferred to proportional odds.
  • This is not ideal, however, as it would be more appropriate (as well as more parsimonious) to assume proportional odds given the ordinal nature of the data and how the categories are defined.
  • The only significant pairwise comparison(s) for this model occur for a minimally-represented outcome (i.e., ‘Good’; \(n = 6\)).
  • These results are likely a direct consequence of the class imbalance observed in the data.


12-month follow up

Listwise deletion

Proportional odds logistic regression

##### MODELS TESTED:
M3.0 <- polr(Outcome_12mo_4cat ~ 1) # Intercept-only
M3.1 <- polr(Outcome_12mo_4cat ~ Theta_1 + Theta_2 + Theta_3) # Hypothesized
##      Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## M3.0       103   142.9681                                
## M3.1       100   140.1748 1 vs 2     3 2.793269 0.4246092

The LRT showed no significant improvement in model fit for the hypothesized model over the intercept model, \(\chi^2 (3,\,N = 106) = 2.79,\ p = .42\). Coefficients for the hypothesized model are displayed below.

## Call:
## MASS::polr(formula = Outcome_12mo_4cat ~ Theta_1 + Theta_2 + 
##     Theta_3, data = out4)
## 
## Coefficients:
##           Value Std. Error  t value P value
## Theta_1 -0.1969     0.3201 -0.61524  0.5398
## Theta_2 -0.5140     0.3880 -1.32491  0.1882
## Theta_3 -0.0149     0.2888 -0.05158  0.9590
## 
## Intercepts:
##            Value   Std. Error t value P value
## Poor|Inter  1.5374  0.5656     2.7181  0.0077
## Inter|Good  3.3377  0.6928     4.8175  0.0000
## Good|Excel  3.5756  0.7299     4.8987  0.0000
## 
## Residual Deviance: 140.1748 
## AIC: 152.1748 
## (141 observations deleted due to missingness)


Multinomial regression

##### MODELS TESTED:
M4.0 <- multinom(Outcome_12mo_4cat ~ 1) # Intercept-only 
M4.1 <- multinom(Outcome_12mo_4cat ~ Theta_1 + Theta_2 + Theta_3) # Hypothesized
##      Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## M4.0       315   142.9681                                
## M4.1       306   133.9391 1 vs 2     9 9.029035 0.4345983

No significant improvement of the hypothesized model over the intercept-only model, \(\chi^2 (9,\,N = 106) = 9.03,\ p = .43\). Model coefficients displayed below.

## Call:
## nnet::multinom(formula = Outcome_12mo_4cat ~ Theta_1 + Theta_2 + 
##     Theta_3, data = out4)
## 
## Coefficients:
##       (Intercept)    Theta_1    Theta_2    Theta_3
## Inter   -1.388111 -0.1613010 -0.1047764 -0.1497858
## Good    -5.122783 -1.6962427 -0.5057093 -3.2037231
## Excel   -5.153626 -0.1613949 -1.6678931  0.7241565
## 
## Std. Errors:
##       (Intercept)   Theta_1   Theta_2   Theta_3
## Inter   0.6061414 0.3563913 0.4281538 0.3260344
## Good    2.8298118 1.9744594 1.4683185 2.6090278
## Excel   1.6311602 0.7023295 0.8192311 0.6130741
## 
## Residual Deviance: 133.9391 
## AIC: 157.9391
## P values:
##       (Intercept)   Theta_1    Theta_2   Theta_3
## Inter 0.022016771 0.6508403 0.80667584 0.6459347
## Good  0.070250657 0.3902896 0.73053506 0.2194704
## Excel 0.001580481 0.8182477 0.04175788 0.2375276

Because neither model performed significantly better than the intercept models, we will not bother investigating them further.


Multiple imputation

With the 12-month outcome, more than half of the data are missing. In general, imputing these data and interpreting the results is inappropriate. However, I have done this here simply for the sake of completeness.

This clearly shows that the imputations poorly represent the original sample. We even see evidence that the imputation procedure oscillated between two relatively stable—yet distinct—solutions. This is almost certainly due to the fact that more than half the data \((\sim 57\%)\) are missing for this measure, and thus precludes interpretation of any models fit to these data. Nevertheless, below we see the distributions of \(p\)-values associated with the LRTs conducted for each type of model.

  • For the proportional odds model, the mean of the \(p\)-values is .225, and the median is .192.
  • For the multinomial model, the mean of the \(p\)-values is .025, and the median is .004.

Although this seems to provide some support for the idea that the EPSI Thetas have some predictive accuracy for the 12-month outcome (at least for the multinomial model), we must keep in mind the fact that:

  1. This result only came after imputing \(\sim 57\%\) of the data.
  2. The imputed distributions differ from the observed distribution.
  3. This result is quite different from what we found with listwise deletion.

The main point of interest here is that we do seem to be observing that the proportional odds assumption is typically violated in these data, as the multinomial model generally appears to outperfom the alternative.

Although we really should not interpret the models for the 12-month outcome, here are the pooled coefficients for the multinomial models fit to the imputed data.

## Multinomial model: Pooled coefficients
##                      estimate std.error  statistic       df      p.value    
## Inter:(Intercept) -0.19589823 0.3813621 -0.5136803 194.0271 0.6080507906    
## Inter:Theta_1     -0.44706916 0.2529781 -1.7672245 195.3270 0.0787379115    
## Inter:Theta_2      0.29415931 0.2934737  1.0023361 183.2178 0.3174100872    
## Inter:Theta_3      0.03794014 0.1954688  0.1940982 197.1280 0.8462988982    
## Good:(Intercept)  -0.90785691 0.5361215 -1.6933791 143.1977 0.0919629023    
## Good:Theta_1       0.10147487 0.3325106  0.3051779 140.5557 0.7605526810    
## Good:Theta_2       1.05656178 0.4475622  2.3607039 136.9616 0.0192164204 *  
## Good:Theta_3      -0.08089795 0.3100929 -0.2608829 123.3224 0.7944551332    
## Excel:(Intercept) -1.94889999 0.5717767 -3.4084985 141.6545 0.0007919806 ***
## Excel:Theta_1      0.07661097 0.2798667  0.2737409 175.7985 0.7845703398    
## Excel:Theta_2     -0.29267175 0.3880430 -0.7542251 139.8452 0.4516140961    
## Excel:Theta_3      0.48171837 0.2596626  1.8551701 149.8833 0.0650650782

Conclusion

  • Imputing data for the 12-month outcome is inappropriate given the high degree of missingness.
  • We can see that this invalid after comparing the imputed distributions with that observed in the data.
  • In the models where listwise deletion was employed, no differences between the null and hypothesized models were found.
  • This variable likely contains too many missing values and is too heaviliy imbalanced to use for further analysis.


Binarized Outcome

Given the imbalanced distribution of classes at each measurement occasion, we may benefit from investigating these variables after collapsing the responses into two categories: 0 = Poor, and 1 = Marginal (i.e., Intermediate + Good + Excellent). The following table shows the frequencies for this coding scheme:



6 month
12 month
Outcome Frequency Percent Frequency Percent
Poor 22 8.9% 82 33.2%
Marginal 147 59.5% 24 9.7%
Missing 78 31.6% 141 57.1%

In each section, I take the same basic approach as I did for the polytomous models, except that here I only test a logistic regression model given that we are using a binary outcome.

6-month follow up

Listwise deletion

Logistic regression

## Analysis of Deviance Table
## 
## Model 1: Outcome_6mo_2cat ~ 1
## Model 2: Outcome_6mo_2cat ~ Theta_1 + Theta_2 + Theta_3
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1       168     130.71                     
## 2       165     127.92  3   2.7886   0.4254

The LRT shows no significant improvement in prediction for the hypothesized model over the intercept model, \(\chi^2 (3,\,N = 169) = 2.79,\ p = .43\). Coefficients for the hypothesized model are displayed below.

## 
## Call:
## glm(formula = Outcome_6mo_2cat ~ Theta_1 + Theta_2 + Theta_3, 
##     family = binomial, data = out2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2387   0.4278   0.4967   0.5608   0.8565  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   1.4787     0.5366   2.756  0.00586 **
## Theta_1       0.1979     0.3636   0.544  0.58616   
## Theta_2      -0.4757     0.3797  -1.253  0.21031   
## Theta_3      -0.1378     0.2536  -0.543  0.58691   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.71  on 168  degrees of freedom
## Residual deviance: 127.92  on 165  degrees of freedom
##   (78 observations deleted due to missingness)
## AIC: 135.92
## 
## Number of Fisher Scoring iterations: 5


Multiple imputation

The mice package was again used to create 100 imputations of the 6-month outcome.

As in the polytomous analysis with the 6-month outcome, we see that the imputed data seem to effectively represent the original data for the outcome. Similar to the previous section, next I fit pairs of models to each imputed dataset (intercept-only vs. hypothesized), and collected the \(p\)-values from the LRT associated with each dataset.

The mean of the \(p\)-values is .482, and the median is .439. These results support the conclusion that the Theta values do not significantly improve predictive accuracy over the null (intercept-only) model. The pooled model coefficients are displayed below.

## Logistic model: Pooled coefficients
##                 estimate std.error   statistic        df    p.value    
## (Intercept)  0.848356730 0.4169638  2.03460527 117.91573 0.04376876 *  
## Theta_1      0.094836632 0.2782679  0.34081052  98.30709 0.73375454    
## Theta_2     -0.254793843 0.2938960 -0.86695246 140.71491 0.38744399    
## Theta_3      0.008030057 0.2059813  0.03898441 134.54999 0.96895808

Conclusion

  • The imputed data for the 6-month outcome seemed to represent the observed distribution.
  • The analyses did not reveal any notable predictive effect of the EPSI Theta values.
  • The class imbalance in the data may challenge the validity of the results.


12-month follow up

Listwise deletion

Logistic regression

## Analysis of Deviance Table
## 
## Model 1: Outcome_12mo_2cat ~ 1
## Model 2: Outcome_12mo_2cat ~ Theta_1 + Theta_2 + Theta_3
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1       105     113.40                     
## 2       102     111.31  3   2.0903   0.5539

The LRT shows no significant improvement in prediction for the hypothesized model, \(\chi^2 (3,\,N = 106) = 2.09,\ p = .55\). Coefficients for the hypothesized model are displayed below.

## 
## Call:
## glm(formula = Outcome_12mo_2cat ~ Theta_1 + Theta_2 + Theta_3, 
##     family = binomial, data = out2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9827  -0.7420  -0.6532  -0.5039   1.9402  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -1.43797    0.56428  -2.548   0.0108 *
## Theta_1     -0.19164    0.32482  -0.590   0.5552  
## Theta_2     -0.40193    0.37682  -1.067   0.2861  
## Theta_3     -0.04901    0.28995  -0.169   0.8658  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 113.40  on 105  degrees of freedom
## Residual deviance: 111.31  on 102  degrees of freedom
##   (141 observations deleted due to missingness)
## AIC: 119.31
## 
## Number of Fisher Scoring iterations: 4


Multiple imputation

The mice package was used to impute 100 datasets with complete cases of the 12-month outcome.

As with the polytomous coding, we see that the imputed distributions differ greatly from the observed distribution. In fact, the mode in the imputed data is the minimum in the observed data. This means that we cannot interpret the foregoing results as a valid analysis of this outcome.

The mean \(p\)-value for the LRTs is .353, and the median is .303. Once again, we see no evidence for the Theta values being significant predictors of the 12-month outcome. The pooled logistic regression coefficients are displayed below.

## Logistic model: Pooled coefficients
##                estimate std.error  statistic       df   p.value  
## (Intercept) -0.32060588 0.4715931 -0.6798359 67.98063 0.4983908  
## Theta_1     -0.12603912 0.3227456 -0.3905215 55.45056 0.6970936  
## Theta_2      0.05027161 0.3053555  0.1646331 88.10392 0.8696099  
## Theta_3      0.08322651 0.2178946  0.3819578 84.44816 0.7034124

Conclusion

  • Once again, the high number of missing values precludes valid imputation for the 12-month outcome.
  • No evidence was found to support the predictive effect of the EPSI Thetas on the binarized 12-month outcome.


Summary

Overall, analyses do not suggest that the EPSI Thetas are significant predictors of recovery outcomes in the current sample. However, it is important to note that the substantial class imbalance is a clear violation of model assumptions and thus challenges the validity and reliability of the results. In general, though:


Next steps

  1. As soon as we have the Theta estimates for the IDAS-II, I will extend this same analysis to include those variables.
  2. Additionally, I will test models that include interactions between the Thetas from each scale to determine whether the scales have a combined effect on predicting the outcomes.
  3. I am investigating methods for dealing with class imbalance, and hope to use leave-pairs-out cross-validation to address this issue and obtain more reliable parameter estimates.
  4. Kelsie mentioned that she has access to another dataset containing similar measures. We plan to conduct the same analysis on this sample, as well as potentially combine the samples to improve both power and representation of the different outcome categories.


References

Forbush, Kelsie T, Po-Yi Chen, Kelsey E Hagan, Danielle AN Chapa, Sara R Gould, Nicholas R Eaton, and Robert F Krueger. 2018. “A New Approach to Eating-Disorder Classification: Using Empirical Methods to Delineate Diagnostic Dimensions and Inform Care.” International Journal of Eating Disorders 51: 710–21.

Kotov, Roman, Robert F Krueger, David Watson, Thomas M Achenbach, Robert R Althoff, R Michael Bagby, Timothy A Brown, et al. 2017. “The Hierarchical Taxonomy of Psychopathology (Hitop): A Dimensional Alternative to Traditional Nosologies.” Journal of Abnormal Psychology 126: 454–77.