Monday, September 13, 2021

Re-analysis of a systematic review on injury prevention demonstrates that methods do really matter

How much of a difference can methodological decisions make? Quite a bit, argues a new paper published in the Journal of Clinical Epidemiology. A re-analysis of a 2018 meta-analysis on the role of the Nordic hamstring curl (NHE) on injury prevention, the study outlined and then executed several methodological changes within the context of an updated search and found that the resulting magnitude of effect - and strength of recommendations using GRADE - were not quite as dazzling as the original analysis.

Impellizzeri and colleagues noted several suggested changes to the 2018 paper, including:

  • limiting the meta-analysis to higher-level evidence (randomized controlled trials) when available,
  • clarifying the interventions used in the included studies and being cognizant of the effect of co-interventions (for instance, when NHE was used alone versus in combination with other exercises as part of an injury reduction program),
  • being careful not to "double-dip" on events (i.e., injuries) that recur in the same individual when presenting the data as a risk ratio
  • discussing the impact of between-study heterogeneity when discussing the certainty of resulting estimates,
  • presenting the lower- and upper-bounds of 95% confidence intervals for estimates of effect in addition to the point estimates, and
  • taking the limitations of the literature and other important considerations into account when formulating final summaries or recommendations (for instance, using the GRADE framework)
The authors ran an updated systematic search but excluded non-randomized controlled trials or studies that incorporated other exercises with the NHE in the intervention group. Risk of bias was assessed using the Cochrane tool for randomized studies. The overall certainty of evidence as assessed using GRADE was rated "low," although given that concerns regarding risk of bias, inconsistency, and imprecision were noted, the certainty may range to "very low" following the standard GRADE framework. The forest plot of the updated analysis can be seen below.


The results of the updated analysis show that rather than reduce the risk of hamstring injury by 50%, the range of possible effects was too large to draw a conclusion on the effectiveness of this intervention, and only a conditional recommendation can be warranted.

Impellizzeri, F.M., McCall, A., and van Smeden, M. (2021). Why methods matter in a meta-analysis: A reappraisal showed inconclusive injury preventive effect of Nordic hamstring exercise. J Clin Epidemiol, in-press.

The manuscript is available at the publisher's site here.


















Monday, August 30, 2021

Misuse of ROBINS-I Tool May Underestimate Risk of Bias in Non-Randomized Studies

Although it is currently the only tool recommended by the Cochrane Handbook for assessing risk of bias in non-randomized studies of interventions, the Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool can be complex and difficult to use effectively for reviewers lacking specific training or expertise in its application. Previous posts have summarized research examining the reliability of ROBINS-I, suggesting that it can improve with training of reviewers. Now, a study from Igelström and colleagues finds that the tool is commonly modified or used incorrectly, potentially affecting the certainty of evidence or strength of recommendations resulting from synthesis of these studies.

The authors reviewed 124 systematic reviews published across two months in 2020, using A MeaSurement Tool to Assess systematic Reviews (AMSTAR) to operationalize the overall quality of the reviews. The authors extracted data related to the use of ROBINS-I to assess risk of bias across studies and/or outcomes as well as the number of studies included, whether meta-analysis was performed, and whether any funding sources were declared. They then assessed whether the application of ROBIN-I was predicted by the review's overall methodological quality (as measured by AMSTAR), the performance of risk of bias assessment in duplicate, the presence of industry funding, or the inclusion of randomized controlled trials in the review.


Overall methodological quality across the reviews was generally low to very low, with only 17% scoring as moderate quality and 6% scoring as high quality. Only six (5%) of the reviews reported explicit justifications for risk of bias judgments both across and within domains. Modification of ROBINS-I was common, with 20% of reviews modifying the rating scale, and six either not reporting across all seven domains or adding an eight domain. In 19% of reviews, studies rated as having a "critical" risk of bias were included in the narrative or quantitative synthesis, against guidance for the use of the tool. 

Reviews that were of higher quality as assessed by AMSTAR tended to contain fewer "low" or "moderate" risk of bias ratings and more judgments of "critical" risk of bias. Thus, the authors argue, incorrect or modified use of ROBINS-I may risk underestimating the potential risk of bias among included studies, potentially affecting the resulting conclusions or recommendations. Associations between the use of ROBINS-I and the other potential predictors, however, were less conclusive. 

Igelström, E., Campbell, M., Craig, P., and Katikireddi, S.V. (2021). Cochrane's risk-of-bias tool for non-randomized studies (ROBINS-I) is frequently misapplied: A methodological systematic review. J Clin Epidmiol, in-press.

Manuscript available from publisher's website here. 










Tuesday, August 24, 2021

UpPriority: A new tool to guide the prioritization of guideline update efforts

The establishment of a process for assessing the need to update a clinical guideline based on new information and evidence is a key aspect of guideline quality. However, given limited time and resources, it is likely necessary to prioritize clinical questions that are most in need of an update from year to year. A new paper demonstrates proof of concept for the UpPriority Tool, which aims to allow guideline developers to prioritize questions for guideline update. 

The tool comprises six different items when assessing the need to update a given recommendation or topic of guideline:
  • the potential impact of an outdated guideline on patient safety;
  • the availability of new, relevant evidence;
  • the context relevance of the clinical question at hand (is the question still relevant given considerations such as the burden of disease, variation in practice, or emerging care options?);
  • methodological applicability of the clinical question (does the question still address PICO components of interest?);
  • user interest in an update; and
  • the potential impact of an update on access to health care.
To apply this tool in a real-world setting, the authors took a sample of four guidelines published by the Spanish National Health System (NHS) within the past 2-3 years and which utilized the GRADE framework. A survey was then developed in order to assess the above six items, calculate a priority ranking, and from there, decide which questions were in highest need of updating. The survey was disseminated among members of a working group comprising members of the original guideline and additional content experts. Additional factors for consideration included the volume of new evidence, the availability of resources, and the need to include new clinical questions. 




Through this process, a total of 16 (15%) of the 107 questions were defined as high priority for updating.  Of these, 12 were given a score higher than five for one of the individual items (specifically the item assessing an impact on patient safety), while the remaining four received an overall score higher than 30 across all six items.

In addition to the priority ranking derived from the six assessment items, the survey also assessed the usability and inter-observer reliability of the tool itself. The reliability (intra-class correlation) ranged from good in one guideline (0.87) to moderate (0.62 and 0.63) in two guidelines and poor (0.15) in one. The authors conclude that the identification and proper training of content experts to serve as appraisers remains the key challenge for the efficacious application of this tool.

Sanabria, A.J., Alonso-Coelle, P., McFarlane, E., et al. (2021). The UpPriority tool supported prioritization processes for updating clinical guideline questions. J Clin Epidemiol (in-press).

The manuscript can be accessed here.

















Wednesday, August 4, 2021

Correction to guidance for assessing imprecision with continuous outcomes

Systematic review and guideline developers take note: the authors of the 2011 guidance on assessing imprecision within the GRADE framework have recently issued a correction related to the assessment of information size when evaluating a continuous outcome.


Whereas the article stated originally that a sample size of approximately 400 (200 per group) would be required to detect an effect size of 0.2 standard deviations assuming an alpha of 0.05 and a power of 0.8, the correct number is actually 800 (400 per group). 

The full corrigendum can be read here. 

Thursday, July 29, 2021

New GRADE guidance on assessing imprecision in a network meta-analysis

Imprecision is one of the major domains of the GRADE framework and is used to assess whether to rate down the certainty of evidence related to an outcome of interest. In a traditional ("pairwise") meta-analysis which compares two intervention groups, exposures, or tests against one another, two considerations are made: the confidence interval around the absolute estimate of effect, and the optimal information size (OIS). If the bounds of the confidence interval cross a threshold for a meaningful effect, and/or if optimal information size given the sample size in the meta-analysis is not met, then one should consider rating down for imprecision.

In the context of small sample sizes, confidence intervals around an effect may be fragile - meaning they could be changed substantially with additional information. Therefore, the consideration of OIS along with the bounds of the confidence interval helps address this concern when rating the certainty of evidence to develop a clinical recommendation. This is typically done by assessing whether the sample size of the meta-analysis meets that determined by a traditional power analysis for a given effect size.

However, in a network meta-analysis, both direct and indirect comparisons are made across various interventions or tests. Thus, especially if the inclusion of indirect comparisons changes the overall estimate of effect, considering only the sample size involved in the direct comparisons would be misleading. 


A new GRADE guidance paper lays out how to assess imprecision in the context of a network meta-analysis:

  • If the 95% confidence interval crosses a decision-making threshold, rate down for imprecision. Thresholds should be ideally set a priori. It may be considered to rate down by two or even three levels depending on the degree of imprecision and the resulting communication of the certainty of evidence. For example, if imprecision is the only concern for an outcome, rating down by two instead of one level would be the difference between saying that a certain intervention or test "likely" or "probably" increases or decreases a given outcome, versus whether it simply "may" have this effect.
  • If the 95% confidence interval does not cross a decision-making threshold, consider whether the effect size may be inflated. If a point estimate is far away enough from a threshold, even a relatively wide CI may not cross it. Further, relatively large effect sizes from smaller pools of evidence can be reduced with future research. 
    • In the case of a large effect size, consider whether OIS is met. If the number of patients contributing to a NMA does not meet this number, consider rating down by one, two, or three levels depending on the severity of the width of the CI. 
    • If the upper-limit of a confidence interval using relative risk is 3 or more times higher than the lower-limit, OIS has likely not been met. Similarly, upper-to-lower-limit comparisons of odds ratios exceeding 2.5 have likely not met OIS.
  • Alternatively, when the effect size is both modest, plausible, and does not cross a threshold, one likely does not need to rate down for imprecision. 
  • Avoid "double dinging" for imprecision if this limitation has already been addressed by rating down elsewhere.

Brignardello-Peterson R, Guyatt GH, Mustafa RA, et al. (2021). GRADE guidelines 33. Addressing imprecision in a network meta-analysis. J Clin Epidemiol (in-press). 

Manuscript available at the publisher's website here.





Friday, July 16, 2021

New GRADE concept paper identifies challenges and solutions to use of GRADE in public health contexts

The GRADE framework can be applied across a variety of different fields, not the least of which is public health. Public health, as the authors of a new GRADE concept paper define it, is concerned with "preventing disease, prolonging life, and promoting health through the organized efforts of society" and comprises three key domains: health protection, health services, and health improvement. However, the field of public health also has unique challenges in the application of GRADE that require addressing. 

To dig deeper into these challenges and design a plan of action for solutions and guidance, the GRADE Public Health group conducted a scoping review to better understand published accounts of the barriers, challenges, and facilitators to the adoption and application of GRADE in public health contexts, presenting the results of nine identified articles. Of these, five major challenges were identified:

  • Incorporating diverse perspectives 
  • Selecting and prioritizing outcomes
  • Interpreting outcomes and identifying a threshold for decision-making
  • Assessing certainty of evidence from diverse sources (e.g., nonrandomized studies)
  • Addressing implications for decision-makers, including concerns about conditional recommendations
The article then discusses proposed solutions and a work plan to address these key challenges.


Forthcoming GRADE public health guidance articles, collaborations with the GRADE Evidence-to-Decision working group, and the adaptation of GRADE training materials to nonhealth and policy audiences will help guide those in public health contexts in meeting the unique needs presented for rigorous guideline development. Additional promotion of existing GRADE guidance, such as the consideration of equity in the evidence-to-decision process, may help guideline developers within specific challenges related to selecting and prioritizing outcomes or identifying thresholds for decision-making. Ongoing guidance from the GRADE group for Non-Randomizes Studies and the use of ROBINS-I may further improve the application of GRADE in settings where observational evidence is dominant. 

Hilton Boon, M., Thomson, H., Shaw, B., et al. (2021). Challenges in applying the GRADE approach in public health guidelines and systematic reviews: A concept article from the GRADE public health group. J Clin Epidemol 135:42-53.

Article available at the publisher's website here










 

Friday, June 25, 2021

Scholars at 14th GRADE Workshop Discuss the Unique Challenges of Sparse Evidence, Guideline Collaborations, and Financial Incentives in Healthcare

During the 14th GRADE Guideline Development Workshop held virtually last month, the Evidence Foundation had the pleasure of welcoming three new scholars with the opportunity to attend the workshop free of charge. As part of the scholarship, each recipient presented to the workshop attendees about their current or proposed project related to evidence-based medicine and reducing bias in healthcare.

This spring's lot of three scholars was nothing short of incredibly impressive. Ifeoluwa Babatunde, a PhD student in clinical research at Case Western Reserve University, discussed the unique challenges of developing a guideline on the management of patients undergoing patent foramen ovale (PFO) closure for the Society for Cardiovascular Angiography and Interventions (SCAI). The synthesis of evidence for this question is hampered by controversies and limited evidence as well as complications due to comorbidities and age differences in the populations of interest. Babatunde discussed her interest in attending the workshop to learn more about the appropriate use of observational and indirect evidence to better answer questions related to PFO closure.

"The GRADE workshop helped me to see systematic review methodology from a deeper and more critical perspective," said Babatunde. "GRADE offers a very comprehensive yet succinct and transparent framework for developing and ascertaining the certainty of evidence in guidelines. Hence I feel better equipped to tackle challenges that arise from creating reviews and guidelines regarding conditions and populations with sparse RCTs."


Next, Dr. Pichamol Jirapinyo, the Director of Bariatric Endoscopy Fellowship at Brigham and Women's Hospital and instructor at Harvard Medical School, discussed her work on an international joint guideline development effort between the American Society for Gastrointestinal Endoscopy (ASGE) and the European Society of Gastrointestinal Endoscopy (ESGE) to produce recommendations for endoscopic and bariatric metabolic therapy (EBMT) in patients with obesity. EBMT is one of several possible management routes for obesity, alongside pharmacological and surgical options. The project will aim to answer several questions, including how patients should be managed before and after EBMT, and regarding the safety and efficacy of both gastric and small bowel EBMT.

“The GRADE workshop provided me a great framework on how to apply GRADE methodology to systematic review and meta-analysis to rigorously develop a guideline," said Dr. Jirapinyo. 'In addition to learning about the GRADE methodology itself, I found the workshop to be tremendously helpful with providing practical tips on how to run a guideline task force successfully and efficiently.”  

Finally, Dr. Lillian Lai, a research fellow in the Department of Urology at the University of Michigan, presented an intriguing discussion of financial incentives in clinical decision-making in urology. The surveillance and management of localized prostate cancer, for instance, has several different options ranging from active surveillance (which is less costly) to prostatectomy (which is more costly). Regardless of the reported health outcomes of these approaches, there is little financial incentive to conduct surveillance as opposed to surgery. The project's goal is to use health services research methods to understand how urologists response to large financial incentives, and then create financial incentives and remove financial disincentives for the promotion of guideline-concordant practices. 

"I gained invaluable knowledge on how to use the GRADE approach to rate the certainty of evidence and strength of recommendations," said Dr. Lai. "Going through the guideline development tool with experts in small groups was particularly useful for me to understand what a guideline recommendation means and entails. This workshop came at a critical time in the backdrop of COVID, and the ever-changing landscape of medicine where patients and providers need to make timely and informed decisions together."

If you are interested in learning more about GRADE and attending the workshop as a scholarship recipient, applications for our upcoming virtual workshop in October are now open. The deadline to apply is July 31, 2021. Details can be found here.