Friday, May 22, 2020

Research Shorts: Assessing the Certainty of Diagnostic Evidence, Pt. II: Inconsistency, Imprecision, and Publication Bias

Earlier this week, we discussed a recent publication of the GRADE series in Journal of Clinical Epidemiology that provides guidance for assessing risk of bias and indirectness across a body of evidence of diagnostic test accuracy. In this post, we’ll follow-up with continued guidance (published in Part II) for the rest of the GRADE domains.


Unexplained inconsistency should be evaluated separately for the findings on test specificity and test sensitivity. When a meta-analysis is available, both the visual and quantitative markers of inconsistency can be used in a similar fashion to a meta-analysis of intervention studies. If differences between studies related to any of the PICO elements is suspected as an explanation for observed heterogeneity, exploration via subgroup analyses may be appropriate.


Again, imprecision of a test’s sensitivity and specificity should be evaluated separately. As with assessments of interventional evidence, evaluation of imprecision across a body of test accuracy studies entails the consideration of the width of the confidence interval as well as the number of events (specifically, the number of patients with the disease and the number of positive tests for sensitivity, and the number of patients without the disease and the number of negative tests for specificity).

In contextualized settings, when one end of the confidence interval may lead to the use of the testing strategy while the other end would not, then imprecision is likely present. It may be helpful to set priori a threshold through which a confidence interval should not cross in order for the test to have sufficient value.

Publication Bias

The use of traditional funnel plot assessments (e.g., Egger’s or Begg’s test) on a body of test accuracy studies is more likely to result in undue suspicion of publication bias than when applied to a body of therapeutic studies. While other sophisticated statistical assessments are available (e.g., Deeks’ test, trim and fill), systematic review and health technology assessment (HTA) authors may choose to base a judgment of publication bias on the knowledge of the existence of unpublished studies. If studies published by for-profit entities or those with precise estimates claiming high test accuracy despite small sample sizes exist, publication bias may also be suspected.

Upgrading the Certainty of Evidence ("Rating Up")

As with an assessment of interventional evidence, there may be reasons to upgrade the certainty of evidence in the face of highly convincing links between the use of a test and the likelihood and/or magnitude of an observed outcome. The diagnostic test accuracy equivalent of a dose-response gradient – the Receiving Operator Characteristic, or ROC curve – may be used to assess this potential upgrader.

Sch√ľnemann H, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, Bossuyt P, et al. GRADE guidelines 21 pt. 2. Test accuracy: inconsistency, imprecision, publication bias, and  other domains for rating the certainty of evidence and presenting it in evidence profiles and summary of findings tables. J Clin Epidemiol  2020 Feb 10. pii: S0895-4356(19)30674-2. doi: 10.1016/j.jclinepi.2019.12.021. [Epub ahead of print].

Manuscript available here on the publisher's site.