Correlation matrices and Factor Analysis (ok, PCA too)

The issue of statistical assumptions is extremely important in Psychology, as it is a "common secret" that we frequenty violate those assumptions; obviously, we should not. One way that we do so though is under Principal Component Analysis, or under Exploratory Factor Analysis modeling (and even under Confirmatory Factor Analysis). It is very well known that psychological data are most of the times ordinal ones (at the item level). For example, Likert type scales yield numbers which are merely labels-in-order with unequal distances among steps. This makes this type of data susceptible to errors when correlations are computed, as the basic assumption for Pearson's r statistic is cleary violated (data must be numbers to compute Pearson's r). It is also very well known that Factor Analysis (and PCA as well) are methods which were devised for mathematical purposes and in Math, numbers prevail. EFA and PCA assume numbers, but in Psychology we seem to forget that fact. Another pitfall which is clearly part of the problem (it is an inflation parameter not a causal one though) is the fact that statistical software (most of statistical packages) do not safeguard for the improper use of ordinal data in factor analytic models, as most software will not stop the user when "numbers" are used instead of actual numbers, but they will just go on computing Pearson's r as if those numbers were actually actual numbers. And they are not. Turning to personal experience, when I discussed these issues a few years back with a Dutch colleague, he replied "why bother? just push the SPSS button, and get the solution, don’t make a fuss about it". Well, I guess it is time we "make some fuss" with respect to this important issue.

If we accept that we need to do something about the above, we may have a straightforward solution which is acceptable, albeit under conditions. These conditions are that from a methodologigal perspective and from the very beginning we should refrain from using Likert type scales in order to assess our item data (for example, via an Anxiety assessment Scale); instead, we should opt for binary replies (No-Yes, Absence-Presence, False-True, Never-Always, Negative-Positive, Low-High, Inexistent-Existent, etc., and "in numbers", 0 - 1 ). In this case, we can go "round the problem" as Phi coefficients are the appropriate statistic and since Phi coefficients are numerically identical with Pearson's r statistics, we can just "push the button" safely (the software will think that it computes and analyzes Pearson's r but in reality the correct Phi coefficient is employed instead). Binary measures are much better than Likert-type ones in many ways (however, I will not discuss that here); of course, there are caveats as well, such as the fact that Phi coefficients are largely affected by the number of extreme cases. However, Phi coefficients might resolve problematic situations such as the above from the very beginning, that is without having to remedy at a second stage. Still, most of the psychological constructs assessed in research are the aggregates of "summated rating scales" (or subsets of items which yield the dimensions supported by the Scale authors who have also computed those dimensions-factors on Pearson's r indices -should they?). The point here is that although the aggregate is a number indeed (or can be considered being so), the items themselves creating this aggregate are not numbers in any sense.

We have proposed a statistical way (as fully described in Mylonas et al. 2012, in Zajenkowska, Mylonas et al. 2014, and elsewhere)[1] to resolve the issue, without having to resort to binary-type assessment for each item (or when we are unable to do so). The method has repeateadly delivered much better and much more stable factor structures for several Likert-type datasets with the most exceptional paradigm being the STAR and STARGR modeling (Mylonas et al., 2017, 2023)[2]. The method is quite straightforward and fairly simple to apply: we compute (for the same Likert-type item dataset) Pearson's r indices, as if they are numbers (and they may be indeed, under certain conditions -we cannot say unless we test for it); we then compute Spearman Rho's or even better Kendall's Tau_b coefficients (the non-parametric approach) to avoid bias in these correlation estimates. We should then compare the two matrices (r vs. Tau_b) using the procedure explained in Mylonas et al. 2012, in Zajenkowska, Mylonas et al. 2014 and elsewhere, so as to reach a decision as to whether Pearson's r indices are safe to use in the analysis or not. The factor structures for each condition have been repeateadly compared for various datasets and the verdict is in favour of the Tau_b use. The method is not our own, it has been proposed many years ago (following Winer, 1971)[3] as a method to compare for magnitude between two Pearson's r correlation matrices across two independent groups via a Fisher z transformation. Our extension of the method is that we can use it for the purposes described above and compare two correlation matrices within the same sample but across different (parametric vs. non-parametric) correlation indices. In the literature produced (e.g., Mylonas et al., 2017, 2023), the factor structures that emerged  i) were more stable and quite clearer, ii) were free of cross-loadings, or at least they suffered less from this problem, and  iii) fewer error covariances (or none at all) had to be estimated under Confirmatory Factor Modeling for the Kendall's Tau_b correlation solutions, in our quest to arrive at a useful and sound factor struture in the data (both in psychometric terms and in research-analysis terms). We strongly suggest that this method is applied whenever Likert-type scales have been employed for data collection and that the proper correlation matrix is analyzed (under PCA, EFA, or CFA).

A brief extract from the Mylonas et al, 2012 paper is given below:


[1] Mylonas, K., Veligekas, P. Gari, A., & Kontaxopoulou, D. (2012). Development and Psychometric Properties of the Scale for Self-Consciousness Assessment. Psychological Reports: Measures and Statistics, 111(1), 233-252.

      Zajenkowska, A., Mylonas, K., Lawrence,C., Konopka. K., & Rajchert, J. (2014). Cross-cultural sex differences in situational triggers of aggressive responses. International Journal of Psychology, 49(5), 355-363, DoI: 10.1002/ijop.12052

[2] Mylonas, K., Lawrence, C., Zajenkowska, A., & Bower Russa, M. (2017). The Situational Triggers of Aggressive Responses scale in five countries: Factor structure and country clustering solutions. Personality and Individual Differences, 104(1), 172-179. Online first, August 2016, DoI: dx.doi.org/10.1016/j.paid.2016.07.030  

      Mylonas, K., Lawrence, C., Frangistas, I., Bower Russa, M., Papazoglou, S., Papachristou, I., & Zajenkowska, A. (2023). Greek Standardization of the Situational Triggers of Aggressive Responses (STARGR). International Perspectives in Psychology, 12(3), 147-163. https://doi.org/10.1027/2157-3891/a000065

[3] Winer, B.J. (1971). Statistical Principles  in Experimental Design, 2η έκδοση. NY: McGraw-Hill.