U.S. GRADE Network blog: Heterogeneity Basics: When It Matters, and What to Do About It

The biggest benefit of a meta-analysis is that is allows multiple studies' findings to be pooled into a single effect estimate, raising the statistical power of the test and potentially raising our certainty in the effect estimate in turn. However, a single estimate may be misleading if there is significant heterogeneity (or inconsistency in GRADE terminology) among the individual studies. One study, for instance, may point to a potential harm of an intervention while the others in the same meta-analysis suggest a benefit; this study may vary from the others in important ways regarding its population, the performance of the intervention, or even the study design itself. A brief primer on heterogeneity newly published by Cordero and Dans details how it can be identified and managed to improve the way implications of a meta-analysis are presented and applied.

Of eyeballs and i2: detecting heterogeneity

Identifying the presence of heterogeneity among a group of pooled studies may be as simple as visually inspecting a forest plot for confidence intervals that show poor overlap or discordance in their estimate of effects (i.e., some showing a likely benefit while others showing a likely harm).

However, some statistical analyses can also provide more nuanced and objective measures of potentially worrisome heterogeneity:

the Q statistic, which tests the null hypothesis that no heterogeneity is present and provides a p-value for this likelihood (however, large p-values should not necessarily be interpreted as the absence of heterogeneity).
i2 is a measure based on the Q statistic and can be interpreted generally as the amount of total variability within the sample that is due to differences between studies. The larger the i2, the greater the likelihood of "real" heterogeneity. A 95% confidence interval surrounding the estimate should be presented when using i2 to detect heterogeneity.

You've found some heterogeneity. What now?

Once heterogeneity has been detected - preferably through a combination of visual inspection and statistical analysis - explanations for these between-study differences should be sought. A comparison of the details of each study's PICO (Population, Intervention, Comparator, Outcome) elements is a great place to start. For instance, does the one outlying study have an older mean age in their population? Did they narrow their inclusion criteria to, say, only pregnant women? Perhaps they defined and operationalized their outcome in a different way than the other studies.

If heterogeneity cannot be explained with this method, it's best to use a random-effects model for meta-analysis, because unlike the fixed-effects model, it does not assume that there is a single "true" effect of the intervention which all of the included studies are estimating. The random-effects model, on the other hand, assumes a level of variability between the studies and that each study is providing its own unique estimate within its unique setting.

Cordero CP and Dans AL. (2021). Key concepts in clinical epidemiology: Detecting and dealing with heterogeneity in meta-analyses. J Clin Epidemiol 130:149-151.

Manuscript available here.

Tuesday, January 26, 2021

Heterogeneity Basics: When It Matters, and What to Do About It

Blog Archive