Thursday, November 11, 2021

Fall scholars dazzle at the 15th GRADE Guideline Development Workshop

During the 15th GRADE Guideline Development Workshop held virtually last month, the Evidence Foundation had the pleasure of welcoming three new scholars with the opportunity to attend the workshop free of charge. As part of the scholarship, each recipient presented to the workshop attendees about their current or proposed project related to evidence-based medicine and reducing bias in healthcare.


Razan Mansour, MD, a postdoctoral research fellow at the University of Kansas Medical Center,  spoke about the challenges of navigating variabilities among published systematic reviews when developing clinical recommendations. Variability may emerge from differences between reviews regarding study inclusion criteria, risk of bias assessment, and the way data are presented. Most sources of variability, Dr. Mansour said, are difficult to explain, but a modified version of the A MeaSurement Tool to Assess systematic Reviews (AMSTAR) 2 tool can help identify common "red flags" and help to prioritize and identify the best systematic reviews off of which to base resulting clinical recommendations. Information from multiple higher-quality systematic reviews may be used, and data from individual studies may need to be extracted to paint a complete picture of the evidence.

Says Dr. Mansour, "Learning about how to rate the certainty of evidence about diagnosis was particularly helpful to me, as this is usually challenging. The subgroup discussions were perfect for one-to-one learning and applying your skills as you go through GDT."

Next, Reena Ragala, MOTR/L, spoke about her project as a Senior Evidence-Based Practice Analyst at the Medical University of South Carolina Value Institute. The Institute was established in 2012 to support MUSC's efforts to build the infrastructure to advance science and discovery through evidence-based practice. Starting in the spring/summer of 2022, Ragala will help lead guideline development "boot camps" though the Institute to train providers within rural health networks in the application of evidence-based guideline methodology. Through this process, Ragala hopes to empower rather than intimidate front-line staff and clinicians new to clinical guideline development.

The fall 2021 Evidence Foundation scholars pose for a virtual group photo with U.S. GRADE Network faculty between workshop sessions. From top-left:  Philipp Dahm, Reem Mustafa, Osama Altayar, Yngve Falck-Ytter, Rebecca Morgan, Carolina Soledad, Reena Ragala, Perica Davitkov, Madelin Siedler, Razaan Mansour, and Shahnaz Sultan.

"When using GRADE for diagnostic tests or guideline development, judgements or recommendations are based on patient outcomes, said Ragala. "GRADE doesn’t just look at the strength/quality of the evidence, but incorporates the feasibility, accuracy, bias, and benefit/harm to the patient to ultimately make recommendations."

Carolina Soledad, MD, MStat, PhD, spoke about the unique opportunities, challenges, and solutions within an aspect of guideline development that has gained great relevance in recent years: reaching consensus within the context of virtual meetings. Dr. Soledad, a junior methodologist within the European Society of Anesthesia and Intensive Care (ESAIC) Guidelines Committee, discussed the fact that while in-person meetings have several strengths -  such as access to nonverbal cues such as body language and facial expression and shared context - virtual meetings also have unique benefits, such as cost savings and a reduced need to focus on logistics. In addition, they have become a necessity in the age of a global pandemic. To help address the unique questions surrounding virtual meetings - such as the optimal length and number of participants  - Dr. Soledad developed a 33-item survey of anesthesiologists and intensivists involved in guideline development. The findings will help formulate best practices for improving the level of communication, engagement, and effectiveness of future virtual meetings.

According to Dr. Soledad, "This workshop gave me a peek into GRADEpro GDT software, and it turned out to be easier to use than I've thought!"

If you are interested in learning more about GRADE and attending the workshop as a scholarship recipient, applications for our upcoming workshop in Chicago, Illinois, are now open. The deadline to apply is March 31, 2022. Details can be found here. 






Friday, September 24, 2021

6 Simple Rules for Creating a Plain Language Summary of a GRADE Guideline Recommendation

While seasoned clinicians and methodheads may consider the nuances of clinical guideline development everyday fare, there is generally a lack of awareness among the public about the implications and use of guidelines. In addition, there is some evidence of public concern that guidelines may be used to ration care as well as public confusion about how they should be applied to an individual's unique needs. The translation of guideline recommendations into plain language, however, may help improve public knowledge and awareness of the applications and implications of guidelines.

In a new paper published in the Journal of Clinical Epidemiology, Santesso and colleagues set out to develop a template for communicating guideline recommendations as well as explore public attitude around guidelines. First, the authors conducted semi-structured focus groups to gather information about general perceptions and opinions regarding guidelines. Then, these insights were used to develop a plain language template which was user-tested. The template was then revised into a final version. During the process, a few key themes emerged, including:

  • an upfront and clear description of the population/individuals to whom the guideline applies
  • a section detailing topics and questions to bring up with one's health care provider
  • definitions surrounding the strength of the recommendation, and further considerations for decision-making around conditional recommendations
  • formatting that makes use of bullets and tables rather than blocks of text

These themes informed the development of the final template, which includes six major items:
  1. the recommendation, its strength (with a symbol), and an explanation 
  2. the population/individuals to whom the recommendation applies
  3. rationale for the strength of the recommendation
  4. additional considerations when using the recommendation
  5. benefits and harms 
  6. implications, what a patient can do, and questions or topics to discuss with one's health care provider
Santesso, N., Wiercioch, W., Barbara, A.M., Dietl, H, and Schünemann, H.J. (2021). Focus groups and interviews with the public led to the development of a template for a GRADE plain language recommendation. J Clin Epidemiol, in-press.

Manuscript available at the publisher's website here




















Monday, September 13, 2021

Re-analysis of a systematic review on injury prevention demonstrates that methods do really matter

How much of a difference can methodological decisions make? Quite a bit, argues a new paper published in the Journal of Clinical Epidemiology. A re-analysis of a 2018 meta-analysis on the role of the Nordic hamstring curl (NHE) on injury prevention, the study outlined and then executed several methodological changes within the context of an updated search and found that the resulting magnitude of effect - and strength of recommendations using GRADE - were not quite as dazzling as the original analysis.

Impellizzeri and colleagues noted several suggested changes to the 2018 paper, including:

  • limiting the meta-analysis to higher-level evidence (randomized controlled trials) when available,
  • clarifying the interventions used in the included studies and being cognizant of the effect of co-interventions (for instance, when NHE was used alone versus in combination with other exercises as part of an injury reduction program),
  • being careful not to "double-dip" on events (i.e., injuries) that recur in the same individual when presenting the data as a risk ratio
  • discussing the impact of between-study heterogeneity when discussing the certainty of resulting estimates,
  • presenting the lower- and upper-bounds of 95% confidence intervals for estimates of effect in addition to the point estimates, and
  • taking the limitations of the literature and other important considerations into account when formulating final summaries or recommendations (for instance, using the GRADE framework)
The authors ran an updated systematic search but excluded non-randomized controlled trials or studies that incorporated other exercises with the NHE in the intervention group. Risk of bias was assessed using the Cochrane tool for randomized studies. The overall certainty of evidence as assessed using GRADE was rated "low," although given that concerns regarding risk of bias, inconsistency, and imprecision were noted, the certainty may range to "very low" following the standard GRADE framework. The forest plot of the updated analysis can be seen below.


The results of the updated analysis show that rather than reduce the risk of hamstring injury by 50%, the range of possible effects was too large to draw a conclusion on the effectiveness of this intervention, and only a conditional recommendation can be warranted.

Impellizzeri, F.M., McCall, A., and van Smeden, M. (2021). Why methods matter in a meta-analysis: A reappraisal showed inconclusive injury preventive effect of Nordic hamstring exercise. J Clin Epidemiol, in-press.

The manuscript is available at the publisher's site here.


















Monday, August 30, 2021

Misuse of ROBINS-I Tool May Underestimate Risk of Bias in Non-Randomized Studies

Although it is currently the only tool recommended by the Cochrane Handbook for assessing risk of bias in non-randomized studies of interventions, the Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool can be complex and difficult to use effectively for reviewers lacking specific training or expertise in its application. Previous posts have summarized research examining the reliability of ROBINS-I, suggesting that it can improve with training of reviewers. Now, a study from Igelström and colleagues finds that the tool is commonly modified or used incorrectly, potentially affecting the certainty of evidence or strength of recommendations resulting from synthesis of these studies.

The authors reviewed 124 systematic reviews published across two months in 2020, using A MeaSurement Tool to Assess systematic Reviews (AMSTAR) to operationalize the overall quality of the reviews. The authors extracted data related to the use of ROBINS-I to assess risk of bias across studies and/or outcomes as well as the number of studies included, whether meta-analysis was performed, and whether any funding sources were declared. They then assessed whether the application of ROBIN-I was predicted by the review's overall methodological quality (as measured by AMSTAR), the performance of risk of bias assessment in duplicate, the presence of industry funding, or the inclusion of randomized controlled trials in the review.


Overall methodological quality across the reviews was generally low to very low, with only 17% scoring as moderate quality and 6% scoring as high quality. Only six (5%) of the reviews reported explicit justifications for risk of bias judgments both across and within domains. Modification of ROBINS-I was common, with 20% of reviews modifying the rating scale, and six either not reporting across all seven domains or adding an eight domain. In 19% of reviews, studies rated as having a "critical" risk of bias were included in the narrative or quantitative synthesis, against guidance for the use of the tool. 

Reviews that were of higher quality as assessed by AMSTAR tended to contain fewer "low" or "moderate" risk of bias ratings and more judgments of "critical" risk of bias. Thus, the authors argue, incorrect or modified use of ROBINS-I may risk underestimating the potential risk of bias among included studies, potentially affecting the resulting conclusions or recommendations. Associations between the use of ROBINS-I and the other potential predictors, however, were less conclusive. 

Igelström, E., Campbell, M., Craig, P., and Katikireddi, S.V. (2021). Cochrane's risk-of-bias tool for non-randomized studies (ROBINS-I) is frequently misapplied: A methodological systematic review. J Clin Epidmiol, in-press.

Manuscript available from publisher's website here. 










Tuesday, August 24, 2021

UpPriority: A new tool to guide the prioritization of guideline update efforts

The establishment of a process for assessing the need to update a clinical guideline based on new information and evidence is a key aspect of guideline quality. However, given limited time and resources, it is likely necessary to prioritize clinical questions that are most in need of an update from year to year. A new paper demonstrates proof of concept for the UpPriority Tool, which aims to allow guideline developers to prioritize questions for guideline update. 

The tool comprises six different items when assessing the need to update a given recommendation or topic of guideline:
  • the potential impact of an outdated guideline on patient safety;
  • the availability of new, relevant evidence;
  • the context relevance of the clinical question at hand (is the question still relevant given considerations such as the burden of disease, variation in practice, or emerging care options?);
  • methodological applicability of the clinical question (does the question still address PICO components of interest?);
  • user interest in an update; and
  • the potential impact of an update on access to health care.
To apply this tool in a real-world setting, the authors took a sample of four guidelines published by the Spanish National Health System (NHS) within the past 2-3 years and which utilized the GRADE framework. A survey was then developed in order to assess the above six items, calculate a priority ranking, and from there, decide which questions were in highest need of updating. The survey was disseminated among members of a working group comprising members of the original guideline and additional content experts. Additional factors for consideration included the volume of new evidence, the availability of resources, and the need to include new clinical questions. 




Through this process, a total of 16 (15%) of the 107 questions were defined as high priority for updating.  Of these, 12 were given a score higher than five for one of the individual items (specifically the item assessing an impact on patient safety), while the remaining four received an overall score higher than 30 across all six items.

In addition to the priority ranking derived from the six assessment items, the survey also assessed the usability and inter-observer reliability of the tool itself. The reliability (intra-class correlation) ranged from good in one guideline (0.87) to moderate (0.62 and 0.63) in two guidelines and poor (0.15) in one. The authors conclude that the identification and proper training of content experts to serve as appraisers remains the key challenge for the efficacious application of this tool.

Sanabria, A.J., Alonso-Coelle, P., McFarlane, E., et al. (2021). The UpPriority tool supported prioritization processes for updating clinical guideline questions. J Clin Epidemiol (in-press).

The manuscript can be accessed here.

















Wednesday, August 4, 2021

Correction to guidance for assessing imprecision with continuous outcomes

Systematic review and guideline developers take note: the authors of the 2011 guidance on assessing imprecision within the GRADE framework have recently issued a correction related to the assessment of information size when evaluating a continuous outcome.


Whereas the article stated originally that a sample size of approximately 400 (200 per group) would be required to detect an effect size of 0.2 standard deviations assuming an alpha of 0.05 and a power of 0.8, the correct number is actually 800 (400 per group). 

The full corrigendum can be read here. 

Thursday, July 29, 2021

New GRADE guidance on assessing imprecision in a network meta-analysis

Imprecision is one of the major domains of the GRADE framework and is used to assess whether to rate down the certainty of evidence related to an outcome of interest. In a traditional ("pairwise") meta-analysis which compares two intervention groups, exposures, or tests against one another, two considerations are made: the confidence interval around the absolute estimate of effect, and the optimal information size (OIS). If the bounds of the confidence interval cross a threshold for a meaningful effect, and/or if optimal information size given the sample size in the meta-analysis is not met, then one should consider rating down for imprecision.

In the context of small sample sizes, confidence intervals around an effect may be fragile - meaning they could be changed substantially with additional information. Therefore, the consideration of OIS along with the bounds of the confidence interval helps address this concern when rating the certainty of evidence to develop a clinical recommendation. This is typically done by assessing whether the sample size of the meta-analysis meets that determined by a traditional power analysis for a given effect size.

However, in a network meta-analysis, both direct and indirect comparisons are made across various interventions or tests. Thus, especially if the inclusion of indirect comparisons changes the overall estimate of effect, considering only the sample size involved in the direct comparisons would be misleading. 


A new GRADE guidance paper lays out how to assess imprecision in the context of a network meta-analysis:

  • If the 95% confidence interval crosses a decision-making threshold, rate down for imprecision. Thresholds should be ideally set a priori. It may be considered to rate down by two or even three levels depending on the degree of imprecision and the resulting communication of the certainty of evidence. For example, if imprecision is the only concern for an outcome, rating down by two instead of one level would be the difference between saying that a certain intervention or test "likely" or "probably" increases or decreases a given outcome, versus whether it simply "may" have this effect.
  • If the 95% confidence interval does not cross a decision-making threshold, consider whether the effect size may be inflated. If a point estimate is far away enough from a threshold, even a relatively wide CI may not cross it. Further, relatively large effect sizes from smaller pools of evidence can be reduced with future research. 
    • In the case of a large effect size, consider whether OIS is met. If the number of patients contributing to a NMA does not meet this number, consider rating down by one, two, or three levels depending on the severity of the width of the CI. 
    • If the upper-limit of a confidence interval using relative risk is 3 or more times higher than the lower-limit, OIS has likely not been met. Similarly, upper-to-lower-limit comparisons of odds ratios exceeding 2.5 have likely not met OIS.
  • Alternatively, when the effect size is both modest, plausible, and does not cross a threshold, one likely does not need to rate down for imprecision. 
  • Avoid "double dinging" for imprecision if this limitation has already been addressed by rating down elsewhere.

Brignardello-Peterson R, Guyatt GH, Mustafa RA, et al. (2021). GRADE guidelines 33. Addressing imprecision in a network meta-analysis. J Clin Epidemiol (in-press). 

Manuscript available at the publisher's website here.