Data Multivariate Assumptions for Structural Equation Modeling


In doing structural equation modeling, it is important for the data to have no outliers or influential respondents as they have bigger influence on the regression of variables. This is tested using the (A) Cook's distance. In addition, causal modeling also assumes that the independent variables are not highly correlated with each other. Meaning, no (B) multicollinearity. This assumption is tested using the Variance Inflation Factor or VIF values.

Purpose: To check for outliers, influential respondents, and multicollinearity in the dataset.
Dataset to use: Imputed dataset (Dataset_C)

A. Cook’s Distance

  1. On the imputed SPSS dataset, go to Analyze — Regression — Linear.
  2. Put the independent and dependent variables accordingly.
  3. Click the Save button -- check Cook’s -- click Continue -- then OK.
    • Disregard the output. Jump over the data set and check for the new variable COO_1 if it is added.
  4. Graph your output by going to Graphs -- then Chart builder.
    • Use a simple Scatter plot.
    • Assign the COO_1 variable in the Y-axis and respondent ID on the X-axis -- click OK.
  5. In the Output window, look at the graph (plotting of distances).
    • If the Cook's distance is greater than 1.00, you may remove that influential respondent and it is justifiable.
    • Interpretation:
      • The bigger the number, the bigger the influence of that respondent has on the regression of variables.
      • They strengthen the regressions that you observed. 
      • They pull the regression line away from the optimal line.
    • How to remove:
      • Locate that influential respondent in the dataset under the COO_1 variable.
      • Sort descending to identify the highest value.
  6. Run this analysis with one dependent variable at a time (if you have more than 1 dependent variable).
  7. Take note of the new sample size, if you removed cases.
  8. Save as a new SPSS file (Dataset_C_Cook's).
Identifying outliers using Cook's distance in SPSS


B. Multicollinearity

  1. In the imputed dataset, go to Analyze — Regression — Linear.
  2. Put the dependent and independent variables accordingly.
  3. In the Statistics button, click Collinearity Diagnostics --- click Continue -- then OK.
  4. Examine the results.
    • VIFs are ideally less than 3.00 (Less than 10 is okay)
    • Tolerance values are ideally greater than 0.10.
  5. Run it with every dependent variable at a time.
  6. What if there is a multicollinearity problem?
    • It means your independent variables are overlapping in the portion of variance they explain in the dependent variable.
    • In other words, they are somewhat redundant.
    • To address this, drop one of them or consider second-order factors.
Detecting multicollinearity in SPSS

In summary, it is important to check for outliers, influential respondents, and multicollinearity in the dataset as they have a bigger influence on the regression of variables. In addition, Multicollinearity reduces the precision of the Estimate coefficients, which weakens the statistical power of your regression model. Hope this helps.

Big thanks to Dr. James Gaskin for helping me learn on this topic. You may check this YouTube video SEM Series (2016) 6. Multivariate Assumptions where I based these steps together with other videos on his YouTube Channel.

Comments