Monday, August 30, 2021

Misuse of ROBINS-I Tool May Underestimate Risk of Bias in Non-Randomized Studies

Although it is currently the only tool recommended by the Cochrane Handbook for assessing risk of bias in non-randomized studies of interventions, the Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool can be complex and difficult to use effectively for reviewers lacking specific training or expertise in its application. Previous posts have summarized research examining the reliability of ROBINS-I, suggesting that it can improve with training of reviewers. Now, a study from Igelström and colleagues finds that the tool is commonly modified or used incorrectly, potentially affecting the certainty of evidence or strength of recommendations resulting from synthesis of these studies.

The authors reviewed 124 systematic reviews published across two months in 2020, using A MeaSurement Tool to Assess systematic Reviews (AMSTAR) to operationalize the overall quality of the reviews. The authors extracted data related to the use of ROBINS-I to assess risk of bias across studies and/or outcomes as well as the number of studies included, whether meta-analysis was performed, and whether any funding sources were declared. They then assessed whether the application of ROBIN-I was predicted by the review's overall methodological quality (as measured by AMSTAR), the performance of risk of bias assessment in duplicate, the presence of industry funding, or the inclusion of randomized controlled trials in the review.

Overall methodological quality across the reviews was generally low to very low, with only 17% scoring as moderate quality and 6% scoring as high quality. Only six (5%) of the reviews reported explicit justifications for risk of bias judgments both across and within domains. Modification of ROBINS-I was common, with 20% of reviews modifying the rating scale, and six either not reporting across all seven domains or adding an eight domain. In 19% of reviews, studies rated as having a "critical" risk of bias were included in the narrative or quantitative synthesis, against guidance for the use of the tool. 

Reviews that were of higher quality as assessed by AMSTAR tended to contain fewer "low" or "moderate" risk of bias ratings and more judgments of "critical" risk of bias. Thus, the authors argue, incorrect or modified use of ROBINS-I may risk underestimating the potential risk of bias among included studies, potentially affecting the resulting conclusions or recommendations. Associations between the use of ROBINS-I and the other potential predictors, however, were less conclusive. 

Igelström, E., Campbell, M., Craig, P., and Katikireddi, S.V. (2021). Cochrane's risk-of-bias tool for non-randomized studies (ROBINS-I) is frequently misapplied: A methodological systematic review. J Clin Epidmiol, in-press.

Manuscript available from publisher's website here. 

Tuesday, August 24, 2021

UpPriority: A new tool to guide the prioritization of guideline update efforts

The establishment of a process for assessing the need to update a clinical guideline based on new information and evidence is a key aspect of guideline quality. However, given limited time and resources, it is likely necessary to prioritize clinical questions that are most in need of an update from year to year. A new paper demonstrates proof of concept for the UpPriority Tool, which aims to allow guideline developers to prioritize questions for guideline update. 

The tool comprises six different items when assessing the need to update a given recommendation or topic of guideline:
  • the potential impact of an outdated guideline on patient safety;
  • the availability of new, relevant evidence;
  • the context relevance of the clinical question at hand (is the question still relevant given considerations such as the burden of disease, variation in practice, or emerging care options?);
  • methodological applicability of the clinical question (does the question still address PICO components of interest?);
  • user interest in an update; and
  • the potential impact of an update on access to health care.
To apply this tool in a real-world setting, the authors took a sample of four guidelines published by the Spanish National Health System (NHS) within the past 2-3 years and which utilized the GRADE framework. A survey was then developed in order to assess the above six items, calculate a priority ranking, and from there, decide which questions were in highest need of updating. The survey was disseminated among members of a working group comprising members of the original guideline and additional content experts. Additional factors for consideration included the volume of new evidence, the availability of resources, and the need to include new clinical questions. 

Through this process, a total of 16 (15%) of the 107 questions were defined as high priority for updating.  Of these, 12 were given a score higher than five for one of the individual items (specifically the item assessing an impact on patient safety), while the remaining four received an overall score higher than 30 across all six items.

In addition to the priority ranking derived from the six assessment items, the survey also assessed the usability and inter-observer reliability of the tool itself. The reliability (intra-class correlation) ranged from good in one guideline (0.87) to moderate (0.62 and 0.63) in two guidelines and poor (0.15) in one. The authors conclude that the identification and proper training of content experts to serve as appraisers remains the key challenge for the efficacious application of this tool.

Sanabria, A.J., Alonso-Coelle, P., McFarlane, E., et al. (2021). The UpPriority tool supported prioritization processes for updating clinical guideline questions. J Clin Epidemiol (in-press).

The manuscript can be accessed here.

Wednesday, August 4, 2021

Correction to guidance for assessing imprecision with continuous outcomes

Systematic review and guideline developers take note: the authors of the 2011 guidance on assessing imprecision within the GRADE framework have recently issued a correction related to the assessment of information size when evaluating a continuous outcome.

Whereas the article stated originally that a sample size of approximately 400 (200 per group) would be required to detect an effect size of 0.2 standard deviations assuming an alpha of 0.05 and a power of 0.8, the correct number is actually 800 (400 per group). 

The full corrigendum can be read here.