Friday, May 14, 2021

Reliability of Risk of Bias Assessments of Non-randomized Studies Improves After Customized Training

We previously reported on a paper published in 2020 assessing the inter-rater reliability (IRR) and inter-consensus reliability (ICR) of the Risk of Bias in Non-Randomized Studies of Interventions (ROBINS-I) tool, developed in 2016, and the Risk of Bias instrument for NRS of Exposures (ROB-NRSE) tool, developed in 2018. This paper found that reliability generally tended to be poor for these tools, while risk of bias assessments took evaluators, on average, 48 minutes for the ROBINS-I tool and almost 37 minutes for the ROB-NRSE.

Now, a new publication from the same group has examined the effect of training on the reliability of these tools. An international team of reviewers with a median of 5 years of experience with risk of bias assessment first applied the ROBINS-I and ROB-NRSE tools to a list of 44 non-randomized studies of interventions and exposures, respectively, using only the 53 pages of publicly available guidance. Then, the reviewers received an abridged and customized training document which was tailored specifically to the topic area of the reviews, included simplified guidance for assessing risk of bias, and also provided additional guidance related to more advanced concepts. The reviewers then re-assessed the studies' risk of bias after a several-weeks-long wash-out period.



Changes in the inter-rater reliability (IRR) for the ROBINS-I (top) and ROB-NRSE tools (bottom) from before and after a customized training intervention.


The training intervention improved the IRR of the ROBINS-I tool, generally improving the range of within-domain reliability while the reliability of the overall bias rating improved from "poor" to "fair." Meanwhile, the ICR improved substantially, with the overall rating's reliability improving from "poor" to "near perfect." Improvements were also observed after training in the application of the ROB-NRSE tool, with IRR of the overall bias improving significantly from "slight" to "near perfect" while its ICR improved from "poor" to "near perfect." For both tools, the pre-to-post-intervention correlations between reviewers' scores were poor, suggesting that the training did have an impact on these measures independent of a simple learning effect. While customized training was associated with a decrease in evaluator burden for the ROBINS-I tool, this did not hold true for the ROB-NRSE.

The findings of this analysis suggest that the use of a customized, shortened guidance tool specifically tailored to the topical content of a review, including simplified guidance for decision-making within each domain, can improve the reliability of resulting risk of bias assessments. The authors suggest that future reviewers create such guidance based on the specific needs and considerations of their topic area, and publish these tools along with the review.

Jeyaraman MM, Robson RC, Copstein L et al. (2021). Customized guidance/training improved the psychometric properties of methodologically rigorous risk of bias instruments for non-randomized studies. J Clin Epidemiol, in-press.

Manuscript available here. 































Tuesday, May 4, 2021

Restricting Systematic Search to English-only is a Viable Shortcut in Most, but Perhaps Not All Topics in Medicine

In the limitations sections of systematic reviews on any topic, it is not uncommon for the authors to discuss how language limitations within their search may have restricted the breadth of evidence presented. For instance, if the reviewers speak only English, the review is likely limited to publications and journals in that language. But how much of a difference does such a limitation make in terms of the overall conclusions of a systematic review? According to a new paper in the Journal of Clinical Epidemiology, probably not much - but it may depend on the specific topic of medicine under investigation.

While other methods reviews have previously examined this question, Dobrescu and colleagues extended the range of topics to methods reviews that included systematic reviews within the realm of complementary and alternative medicine, yielding four reviews previously unexamined by prior studies. Specifically, the authors looked for methods reviews comparing the restriction of literature searches to English-only versus unrestricted searches and whose primary outcomes compared differences in treatment effect estimates, certainty of evidence ratings, or conclusions based on the language restrictions enforced. 

The search yielded eight studies investigating the impact of language restrictions in anywhere from 9 to 147 systematic reviews in medicine. Overall, the exclusion of non-English articles had a greater impact on estimates of treatment effects and the statistical significance of findings in reviews of complementary and alternative medicine versus conventional medicine topics. Most commonly, the exclusion of non-English studies led to a loss of statistical significance in these topic areas.

Overall, the methods studies examined found that the exclusion of non-English studies of conventional medicine topics led to small to moderate changes in the estimate of effect; however, exclusion of non-English studies shrank the observed effect size in complementary and alternative medicine topics by 63 percent. Two studies examined whether language restricted influenced authors' overall conclusions, generally finding no effect.

The figure above shows the frequency of languages of the excluded reviews examined.

The authors conclude that when it comes to systematic reviews of conventional medicine topics, their findings are in line with those of previous methods studies which demonstrate little to no effect of language restrictions and suggest that restricting a search to English-only should not greatly impact the findings or conclusions of a review. However, the effect appears greater in the realm of complementary and alternative medicine, perhaps due to the greater proportion of non-English studies published in this field. Thus, systematic reviewers attempting to synthesize the evidence on an alternative medicine topic should be cognizant of their choices regarding language restriction and the potential implications they may have on their ultimate findings.

Dobrescu A, Nussbaumer SB, Klerings I et al. (2021). Restricting evidence syntheses of interventions to English-language publications is a viable methodological shortcut for most medical topics: A systematic review: Excluding English-language publications a valid shortcut. J Clin Epidemiol, epub ahead of print.

Manuscript available from publisher's website here. 


















Wednesday, April 21, 2021

In Studies of Patients at High Risk of Death, More Explicit Reporting of Functional Outcomes is Needed

Randomized controlled trials examining the effects of an intervention in patients with a high risk of death will often also include functional outcomes - such as quality of life, cognition, or physical disability. However, the death of patients before these outcomes can be assessed (also known as "truncation due to death") can confound the results of a "survivors-only" analysis, especially if mortality rates are higher in certain groups than others. 

A new methodology review of studies published within 5 high-impact general medical journals from 2014 to 2019 provides insight into this phenomenon and suggestions for improving how functional outcomes are handled. To be eligible for the review, a study needed to be a randomized controlled trial (RCT) with a mortality rate of at least 10% in one arm and to report at least one functional outcome in addition to mortality. The authors recorded the outcomes analyzed, the type of statistical analyses used, and the sample population of each of the 434 included studies. For most (351, or 79%) of these, function was a secondary outcome, while it was a primary outcome for 91 (21%) of them.

Only one-quarter (25%) of the functional outcomes within the studies that examined them as secondary outcomes used an approach that included all randomized patients (intention-to-treat); for the studies for which functional outcomes were the primary outcomes analyzed, this proportion was 60%.


The authors provide suggestions for best ways to handle and report data in these studies:
  • In the methods rather than only in tables or supplementary material, explicitly state the sample population from which the functional outcomes were drawn, whether it's survivors-only or another type of analysis.
  • If a survivors-only analysis is used, the authors should report the baseline characteristics between the groups analyzed and transparently discuss this as a limitation within the discussion section.
  • If all randomized participants are analyzed regardless of mortality, authors should report the assumptions upon which these analyses are based; for instance, if death is one outcome ranked among others in a worst-rank analysis, the justification for the ranking of outcomes should be discussed in the methods, and the implications of these decisions included in the discussion section. 
Colantuoni E, Li X, Hashem MD et al. (2021). A structured methodology review showed analyses of functional outcomes are frequently limited to "survivors only" in trials enrolling patients at high risk of death. J Clin Epidemiol (e-pub ahead of print).

Manuscript available here.

Thursday, April 8, 2021

Digging Deeper: 5 Ways to Help Guide Decision-Making When Research Evidence is "Insufficient"

A key tenet underlying the GRADE framework is that the certainty of available research evidence is a key factor to be considered in the course of clinical decision-making. But what if little to no published research exists off of which to base a recommendation? At the end of the day, clinicians, patients, policymakers, and others will still need to make a decision, and will look to a guideline for direction. Thankfully, there are other options to pursue within the context of a systematic review or guideline that ensures that as much of the available evidence is presented as possible, although it may be from less traditional or direct sources.

A new project conducted by the Evidence-based Practice Center (EPC) Program of the Agency for Healthcare Research and Quality (AHRQ) developed guidance for supplementing a review of evidence when the available research evidence is sparse or insufficient. This guidance was based on a three-pronged approach, including:

  • a literature review of articles that have defined and dealt with insufficient evidence, 
  • a convenience sample of recent systematic reviews conducted by EPCs that included at least one outcome for which the evidence was rated as insufficient, and
  • an audit of technical briefs from the EPCs, which tend to be developed when a given topic is expected to yield little to no published evidence and which often contain supplementary sources of information such as grey literature and expert interviews.
Through this approach, the workgroup identified five key strategies for dealing with the challenge of insufficient evidence:
  1. Reconsider eligible study designs: broaden your search to capture a wider variety of published evidence, such as cohort or case studies.
  2. Summarize evidence outside the prespecified review parameters: use indirect evidence that does not perfectly match the PICO of your topic in order to better contextualize the decision being presented.
  3. Summarize evidence on contextual factors (factors other than benefits/harms): these include key aspects of the GRADE Evidence-to-Decision framework, such as patient values and preferences and the acceptability, feasibility, and cost-effectiveness of a given intervention.
  4. Consider modeling if appropriate, and if expertise is available: if possible, certain types of modeling can help fill in the gaps and make useful predictions for outcomes in lieu of real-life research.
  5. Incorporate health system data: "real-world" evidence such as electronic health records and registries can supplement more mechanistic or explanatory RCTs.



Some of these challenges can be more efficiently addressed up-front, before the scoping of a new review even begins. For instance, identifying topic experts and stakeholders who are familiar with the quantity and quality of available evidence can help a group foresee potential gaps and plan for the need to broaden the scope. Care should be taken to identify the outcomes that are of critical importance to patients, and through this lens, develop strategies and criteria within the protocol that will best meet the needs of the review while tapping into as much evidence as possible. Finally, researchers should avoid using the term "insufficient" when describing the evidence, and instead explicitly state that no eligible studies or types of evidence were available.

Murad MH, Chang SM, Fiordalisi CV, et al. (2021). Improving the utility of evidence synthesis for decisionmakers in the face of insufficient evidence. J Clin Epidemiol, ahead-of-print. 

Manuscript available from publisher's website here.

















Friday, April 2, 2021

New Review of Pragmatic Trials Reveals Insights, Identifies Gaps

As opposed to an "explanatory" or "mechanistic" randomized controlled trial (RCT), which seeks to examine the effect of an intervention under tightly controlled circumstances, "pragmatic" or "naturalistic" trials study interventions and their outcomes when used in more real-world, generalizable settings. One example of such a study might include the use of registry data to examine interventions and outcomes as they occur in the "real world" of patient care. However, there are currently few standards for identifying, reporting, and discussing the results of such "pragmatic RCTs." A new paper by Nicholls and colleagues aims to provide an overview of the current landscape of this methodological genre.

The authors searched for and synthesized 4,337 trials using keywords such as "pragmatic," "real world," "registry based," and "comparative effectiveness" to better map an understanding of how pragmatic trials are presented in the RCT literature. Overall, only about 22% (964) of these trials were identified as "pragmatic" RCTs in the title, abstract, or full text; about half of these (55%) used this term in the title or abstract, while the remaining 45% described the work as a pragmatic trial only in the full text. 

About 78.1% (3,368) of the trials indicated that they were registered. However, only about 6% were indexed in PubMed as a pragmatic trial, and only 0.5% were labeled with the MeSH topic of Pragmatic Clinical Trial. The target enrollment of pragmatic trials was a median of 440 participants within an interquartile range (IQR) of 244 to 1,200; the actual achieved accrual was 414 (IQR: 216 - 1,147). The largest trial included 933,789 participants; the smallest enrolled 60.

Overall, pragmatic trials were more likely to be centered in North America and Europe and to be funded by non-industry sources. Behavioral, rather than drug or device-based, interventions were most common in these trials. Not infrequently, the trials were mislabeled or contained erroneous data in their registration information. The fact that only about half of the sample were clearly labeled as "pragmatic" may mean that these trials may go undetected with less sensitive search mechanisms than the authors used.

Authors of pragmatic trials can improve the quality of the field by clearly labelling their work as such and by registering their trials and ensuring that registered data are accurate and up-to-date. The authors also suggest that taking a broader view of what constitutes a "pragmatic RCT" also generates questions regarding proper ethical standards when research is conducted on a large scale with multiple lines of responsibility. Finally, the mechanisms used to obtain consent in these trials should be further examined in light of the finding that many pragmatic trials fail to achieve goals set for participant enrollment.

Manuscript available from publisher's web site here. 

Nicholls SG, Carroll K, Hey SP, et al. (2021). A review of pragmatic trials found a high degree of diversity in design and scope, deficiencies in reporting and trial registry data, and poor indexing. J Clin Epidemiol (ahead of print). 

















Monday, March 15, 2021

A Blinding Success?: The Debate over Reporting the Success of Blinding

While the use of blinding is a hallmark of placebo-controlled trials, whether the blinding was successful - i.e., whether or not participants were able to figure out the treatment condition to which they have been assigned - isn't always tested, nor are the results of these tests always reported. The measurement of the success of blinding in trials is controversial and not uniformly used, and the item has been dropped from subsequent versions of the CONSORT reporting items for trials. According to a recent discussion of the pros and cons to measuring the success of blinding, only between 2-24% of trials perform or report these types of tests.

As Webster and colleagues explain, the benefits to measuring the success of blinding are as follows:

  • the success (or failure) of blinding in a placebo-controlled trial can introduce a source of bias that affects the results. 
  • while the effect of blinding itself may be small, these small effects could still result in changes to policy or practice
  • there are documented instances in which the failure to properly blind (for instance, providing participants with a sour-tasting Vitamin C condition versus a sweet lactose "placebo") led to an observed effect (for instance, on preventing or treating the common cold) whereas there was no effect in the subgroup of participants who were successfully blinded.
Reasons commonly given against the testing of successful blinding include the following:
  • At times, a break in blinding can lead to conclusions in the opposite direction. For instance, physicians who are unblinded may assume that the patients with better outcomes received a drug widely supposed to be "superior," when in fact, the opposite occurred.
  • In some cases, a treatment with dramatically superior results can result in unblinding, even when the treatment conditions were identical - but that doesn't necessarily mean the blinding was a failure or could have been prevented, given the dramatic differences in outcomes.
  • If the measurement of blinding is performed at the wrong time - such as before the completion of the trial - participants may become suspicious and this in itself could potentially confound treatment effects.


Webster RK, Bishop F, Collins GS, et al. (2021). Measuring the success of blinding in placebo-controlled trials: Should we be so quick to dismiss it? J Clin Epidemiol, pre-print.

Manuscript available from publisher's website here.




























Tuesday, March 9, 2021

Expert Evidence: A Framework for Using GRADE When "No" Evidence Exists

To guide the formulation of clinical recommendations, GRADE relies on the use of direct or, if necessary, indirect evidence from peer-reviewed publications as well as the gray literature. However, in some cases, no such evidence may be found even after an extensive search has been conducted. A new paper - part of the informal GRADE Notes series in the Journal of Clinical Epidemiology - relays the results of piloting an "expert evidence" approach and provides key suggestions when using it.

As opposed to simply asking the panel members of a guideline to base their recommendations off of informal opinion, the expert evidence approach systematizes this process by eliciting the extent of their experience with certain clinical scenarios through quantitative survey methods. In this example, at least 50% of the panel members were free of conflicts of interest, with various countries and specialties represented. While members were not required to base their answers off of patient charts, the authors suggest that this can be used to further increase the rigor of the survey. 



As a result of the survey, the recommendations put forward reflected a cumulative 12,000 cases of experience. Because the members felt that at least some recommendation was necessary to help guide care - where the alternative would be to provide no recommendation at all - the guideline helped to fill a gap while indicating the current lack of high-quality published evidence for several clinical questions, which may help guide the production of higher-quality evidence and recommendations in the future. Importantly, by utilizing a survey approach to facilitate the formulation of recommendations, the authors note that it avoided the pitfall of "consensus-based" approaches to guideline development which can often manifest as simply reflecting the opinions of those with the loudest voices. 

Mustafa RA, Cuello Garcia CA, Bhatt M, Riva JJ, Vesely S, Wiercioch W, ... & HJ Sch√ľnemann. (2021). How to use GRADE when there is "no" evidence? A case study of the expert evidence approach. J Clin Epidemiol, in-press. 

Manuscript available from the publisher's website here