Multivariate Outliers: an "extreme group" approach

Multivariate outliers and the detection of extreme groups

The statistical methods for detecting univariate outliers such as boxplot plots and the similar, and the repective multivariate methods (Mahalanobis's D2, Leverage, Cook's Distance, etc.), are well known to statisticians. What is not so apparent is that these methods can be used in detecting subgoups (within a research sample) who may be of special interest (e.g., learning disabilities group, highly aggressive group, higher intelligence group, high creativity groups, giftedness, etc.). Through applied empirical research we have demonstrated (Mylonas & Gari, 2000, 2010) that we can detect gifted students using multivariate criteria such as creativity and intelligence combined and that the method can be used for any set of multivariate criteria which are considered being the elements of a specific trait, instability, interest, motive, deficiency, ability, etc., in short, of any psychological construct.

More specifically: Assuming multivariate normality "Mah D2" can be computed through multiple linear regression (full model, dv = dummy coding, iv = all k criterion variables) and it falls under the χ2  theoretical distribution with k–1 degrees of freedom. The usual α level for accepting a case being a multivariate outlier is when the type I error probability for the χ= Mah D2 (df= k–1) is less than .001, but this can vary depending on the assumed breadth (the more lenient in accepting multivariate outliers, the wider the breadth and the less homegeneous the outliers group). Apart from this usual px  way as just described, there are other ways of interpreting the Mah D2 values such as plotting eccentrical ellipses -for an example, see an R approach offered by Charles Holbert (2022) at https://www.cfholbert.com/blog/outlier_mahalanobis_distance/ . However, there is a caveat in all this; when detecting these outliers the basic methodological aim is to detect them and remove them from the sample so as to achieve less extremity bias in our data. This is an important methodological-statistical aim, but it is not the only one as, in contrast, we may be interested in focusing on these multivariate outliers as they are expected to represent a different than the overall sample population, namely the extreme one. But which of the two extremes (low-high)? Both? The procedure described so far detects multivariate outliers at both ends of the multivariate continuum. This may serve the statistical-methodological aims but it certainly does not serve the aim to detect an extreme group which usually lie at just the one end of the multivariate continuum (i.e., high ability, or low resilience). In others words, no extreme group can contain both construct-ends as, later, during analyses, they would cancel each other out. Our addition to the above method is that before detecting the multivariate outliers we should isolate the subsample of interest (that is the high-end or the low-end group). Thus, another multivariate technique, namely Cluster analysis, is employed first in order to form two clusters of cases, under centroid clustering. Then, only the one cluster thay may contain the target-group is examined further for multivariate outliers. This also suffers from a caveat and the researcher should be cautious, as multivariate normality may be violated within each of the two clusters. Having detected the cases that form the cluster of interest though, we may be able to proceed with the Mahalanobis's D2 computation for each case in this subgroup, as described earlier on. The detected multivariate outliers will serve as the extreme group of interest (as compared to other groups, or being the focus of descriptive statistics, or being the target-group under action research, for intervention reasons, etc.). Past research has supported the method empirically as, using the method described, i) a group of gifted students was detected within an overall sample of 1,765 students in a successful and scientifically useful way (Gari, Kalantzi-Azizi, & Mylonas, 2000; Gari, Mylonas, & Portešová, 2015)[1], ii) a number of other studies on other constructs such as Voluntary Aloneness (Galanaki, Mylonas, & Vogiatzoglou, 2015; Frangistas, 2018)[2], resulted in statistically sound and hermeneutically fruitful outcomes.

 

[1] Gari, A., Kalantzi-Azizi, A. & Mylonas, K. (2000). Adaptation and motivation of Greek gifted pupils: exploring some influences of primary schooling. High Ability Studies, 11(1), 55-68.
     Gari, A., Mylonas, K., & Portešová, S. (2015). An analysis of attitudes towards the gifted students with learning difficulties using two samples of Greek and Czech primary school teachers. Gifted Education International, 31(3), 271-286. DoI: 10.1177 / 0261429413511887
     Μυλωνάς, Κ. & Γκαρή, Αικ. (2010). Εντοπισμός χαρισματικών μαθητών με ψυχομετρικά και στατιστικά κριτήρια: Μέθοδος, χρησιμότητα και προεκτάσεις. Πρακτικά 1ου Πανελληνίου Συνεδρίου Επιστημών Εκπαίδευσης, Μάιος 2009. Αθήνα, Σμυρνιωτάκης. σελ. 120-127.

[2] Galanaki, E., Mylonas, K., & Vogiatzoglou, P. (2015). Evaluating Voluntary Aloneness in Childhood:  Initial Validation of the Children’s Solitude Scale. European Journal of Developmental Psychology - Developmetrics, 12(6), 688-700. DoI: 10.1080/17405629.2015. 1071253
      Φραγγίστας, Ι. (2018). Εντοπισμός πολυμεταβλητά έκτοπων τιμών με εφαρμογή στην Κλίμακα Ευεργετικής Μόνωσης: εναλλακτικές μέθοδοι εντοπισμού και μελέτη περίπτωσης μέσω πολυδιάστατης γεωμετρικής βαθμονόμησης ομοιοτήτων με τριγωνομετρική μετατροπή. Αδημοσίευτη Πτυχιακή Εργασία (προϋπόθεση απόκτησης Πτυχίου Ψυχολογίας), Τμήμα Ψυχολογίας Ε.Κ.Π.Α.