Not Normal: A Simulation Study Comparing Effect Sizes for Skewed Psychological Data

Researchers in psychological science often rely on single numerical indicators to summarize complex differences between groups or conditions. While effect sizes are a cornerstone of this practice, a recent simulation study highlights a critical limitation: the widespread focus on mean differences, which can obscure crucial information about the entire distribution of data. The study, published in Frontiers in Psychology, rigorously compares four commonly used effect size indices – Cohen’s d, the Common Language Effect Size (CLES), the parametric overlap coefficient (Î·_p), and the non-parametric Overlapping Index (Î·) – revealing significant differences in their performance, particularly when data deviates from ideal assumptions of normality and equal variances.

The research, conducted by Ambra Perugini, Giovanni Calignano, and Marco Pastore, underscores a fundamental challenge in psychological research: real-world data, especially in fields like cognitive and clinical psychology, rarely conform to the symmetrical, evenly spread distributions that traditional statistical methods often assume. Reaction times in cognitive tasks, for instance, are notoriously right-skewed, and symptom severity in clinical populations can also exhibit substantial asymmetry. In such scenarios, effect sizes that solely focus on mean differences may provide an incomplete or even misleading picture of how groups truly diverge.

"Researchers often act as if a single number could capture the entire contrast between two empirical worlds," the study abstract notes. "In psychological research, standardized effect size metrics are ubiquitous." However, the investigation reveals that while some indices appear highly correlated, their underlying performance and reliability can differ substantially, especially under non-ideal data conditions.

Key Findings: A Tale of Two Families of Effect Sizes

The study’s core contribution lies in its systematic evaluation of four effect size indices under controlled simulations designed to mimic common real-world data challenges. By generating data from skew-normal distributions – a flexible model allowing for controlled manipulation of mean differences, variance ratios, skewness, and sample size – the researchers could rigorously assess how each index performed.

The results clearly demarcated the indices into two conceptual families:

Mean-Based Location Estimators: Cohen’s d, CLES, and the parametric overlap coefficient (Î·_p) are all derived from or closely related to the standardized difference between group means. These indices share a common theoretical foundation, assuming that the primary difference lies in the central tendency of the distributions.
Distribution-Sensitive Overlap Estimator: The non-parametric Overlapping Index (Î·) takes a different approach by quantifying the actual shared area between two empirical density functions. This metric is sensitive to the entire shape of the distributions, including skewness, kurtosis, and tail behavior, rather than just the mean.

Performance Under Pressure: Cohen’s d Shines, Others Stumble

The simulation employed three key performance metrics: Relative Mean Bias (RMB), Normalized Root Mean Squared Error (NRMSE), and 95% Coverage. These metrics were chosen to assess not only the accuracy of an index but also its precision and inferential reliability.

Cohen’s d emerged as the most robust estimator. It consistently exhibited low bias, high precision (low NRMSE), and accurate coverage across nearly all simulated scenarios, even those with substantial deviations from normality and variance homogeneity. This suggests that Cohen’s d remains a reliable measure of location differences, capturing the shift in means effectively and providing stable confidence intervals.
The Common Language Effect Size (CLES) and the Parametric Overlap Coefficient (Î·_p), while often showing high descriptive correlations with Cohen’s d, demonstrated significant performance limitations. CLES showed substantial bias and low coverage, particularly when mean differences were present. Î·_p also suffered from bias and poor coverage under conditions of variance heterogeneity. This indicates that despite their mathematical links to d, these indices are not always interchangeable and can become unreliable when underlying data assumptions are violated.
The Non-Parametric Overlapping Index (Î·) proved to be a valuable addition. It remained unbiased even under significant variance heterogeneity and skewness, capturing aspects of distributional difference that mean-based measures miss. However, its performance was less reliable when the populations truly overlapped significantly, with coverage issues arising in scenarios of perfect overlap. This highlights that while Î· offers a more complete view of the entire distribution, its interpretation needs to consider the context of maximum similarity.

The study authors noted the "near-perfect descriptive correlations" between Cohen’s d, CLES, and Î·_p, underscoring that these indices often tell a similar story in terms of their raw values. However, the performance metrics revealed that "high empirical correlations do not guarantee same precision."

The Importance of Distributional Shape

The investigation’s emphasis on non-normal data is crucial. Psychological phenomena are rarely perfectly symmetrical. Reaction time data, for example, are frequently characterized by long right tails, meaning a few trials take significantly longer than most. Similarly, in clinical psychology, symptom scores might be clustered at the lower end for healthy individuals (a floor effect) but spread out widely with heavy tails for those with a disorder.

In these common situations, measures like CLES and Î·_p can become distorted. The study points out that asymmetric dispersion can inflate pooled variance, thereby attenuating standardized mean differences derived from d. This can lead to underestimating the true effect size.

The non-parametric Overlapping Index (Î·), on the other hand, is designed to be sensitive to these distributional nuances. It quantifies the extent to which two probability distributions intersect, offering a measure of similarity that is not solely dependent on the distance between their means. This makes it a powerful tool for detecting differences that might be obscured by traditional mean-based metrics.

Implications for Psychological Research

The findings have significant implications for how psychological researchers report and interpret effect sizes. The study advocates for a shift away from relying solely on mean-based conventions towards a more nuanced understanding that considers the entire distribution.

"Researchers should select effect sizes based on their statistical properties and propose a shift toward interpreting effects in light of the full distribution, rather than through mean-based conventions," the authors argue.

For cognitive psychologists working with reaction times or clinical psychologists analyzing symptom scores, this means that while Cohen’s d may still be a useful and robust measure of central tendency shifts, it might not tell the whole story. The Overlapping Index (Î·) could provide complementary information about the degree of separation or similarity between groups, offering a more comprehensive picture of the effect.

The researchers propose a practical approach:

Visualize Data: Always start by plotting the data (e.g., density or violin plots) and examining descriptive statistics (mean, median, variance, skewness, kurtosis).
Consider the Research Question: If the focus is strictly on a mean difference and distributions are well-behaved, Cohen’s d might suffice.
Embrace Distributional Sensitivity: When distributions deviate significantly from normality, exhibit asymmetry, or have different variances, the Overlapping Index (Î·) becomes a valuable tool for capturing distributional separation.
Complementary Reporting: In many cases, reporting both a mean-based index (like Cohen’s d) and the Overlapping Index (Î·) can provide a balanced and informative summary, anchoring interpretations to both location shifts and distributional overlap.
Report Uncertainty: Always report confidence intervals (e.g., standard analytic intervals for d, bootstrap intervals for Î·) to convey the uncertainty around the effect size estimates.

The study concludes that these effect size indices are not interchangeable. While Cohen’s d remains a robust estimator of location differences, Î· offers a more complete view of the entire distribution. The relationship between these indices is not one of rivalry but of complementarity, with each offering a distinct yet valuable perspective on the nature of group differences.

The simulation study, by employing a controlled environment, provides a "principled way to examine how estimators behave when data generating conditions depart from mere differences in mean and variance is not homogeneous between groups." This rigorous methodology allows for a clear understanding of how these effect size metrics perform under realistic, often messy, data conditions, ultimately guiding researchers toward more accurate and insightful interpretations of their findings.

Not Normal: A Simulation Study Comparing Effect Sizes for Skewed Psychological Data

ByLina Irawan

By Lina Irawan

Related Post

Visualizing heterogeneous associations: economic capital and adolescent football participation in China: a cross-sectional study using quantile regression with school-level fixed effects

Chronological age is differentially associated with cognitive performance according to climacteric stage: evidence from a Bayesian multivariate analysis in Chilean women

Ballroom Dance and Positive Aging Among Middle-Aged and Older Adults: A Chain Mediation Model of Social Connection and Loneliness

Leave a Reply Cancel reply

Visualizing heterogeneous associations: economic capital and adolescent football participation in China: a cross-sectional study using quantile regression with school-level fixed effects

Ten Years On: The Enduring and Evolving Legacy of Harambe

GroobyVR CEO Says AI Will Push VR Porn Beyond First-Person Camera Angles

A Hidden Biological Switch Linked to Aging Discovered in the Brain

What creates pleasure for women during sexual intercourse?

You missed

Visualizing heterogeneous associations: economic capital and adolescent football participation in China: a cross-sectional study using quantile regression with school-level fixed effects

Ten Years On: The Enduring and Evolving Legacy of Harambe

GroobyVR CEO Says AI Will Push VR Porn Beyond First-Person Camera Angles

A Hidden Biological Switch Linked to Aging Discovered in the Brain