Monday, September 13, 2021

Re-analysis of a systematic review on injury prevention demonstrates that methods do really matter

How much of a difference can methodological decisions make? Quite a bit, argues a new paper published in the Journal of Clinical Epidemiology. A re-analysis of a 2018 meta-analysis on the role of the Nordic hamstring curl (NHE) on injury prevention, the study outlined and then executed several methodological changes within the context of an updated search and found that the resulting magnitude of effect - and strength of recommendations using GRADE - were not quite as dazzling as the original analysis.

Impellizzeri and colleagues noted several suggested changes to the 2018 paper, including:

  • limiting the meta-analysis to higher-level evidence (randomized controlled trials) when available,
  • clarifying the interventions used in the included studies and being cognizant of the effect of co-interventions (for instance, when NHE was used alone versus in combination with other exercises as part of an injury reduction program),
  • being careful not to "double-dip" on events (i.e., injuries) that recur in the same individual when presenting the data as a risk ratio
  • discussing the impact of between-study heterogeneity when discussing the certainty of resulting estimates,
  • presenting the lower- and upper-bounds of 95% confidence intervals for estimates of effect in addition to the point estimates, and
  • taking the limitations of the literature and other important considerations into account when formulating final summaries or recommendations (for instance, using the GRADE framework)
The authors ran an updated systematic search but excluded non-randomized controlled trials or studies that incorporated other exercises with the NHE in the intervention group. Risk of bias was assessed using the Cochrane tool for randomized studies. The overall certainty of evidence as assessed using GRADE was rated "low," although given that concerns regarding risk of bias, inconsistency, and imprecision were noted, the certainty may range to "very low" following the standard GRADE framework. The forest plot of the updated analysis can be seen below.


The results of the updated analysis show that rather than reduce the risk of hamstring injury by 50%, the range of possible effects was too large to draw a conclusion on the effectiveness of this intervention, and only a conditional recommendation can be warranted.

Impellizzeri, F.M., McCall, A., and van Smeden, M. (2021). Why methods matter in a meta-analysis: A reappraisal showed inconclusive injury preventive effect of Nordic hamstring exercise. J Clin Epidemiol, in-press.

The manuscript is available at the publisher's site here.


















Monday, August 30, 2021

Misuse of ROBINS-I Tool May Underestimate Risk of Bias in Non-Randomized Studies

Although it is currently the only tool recommended by the Cochrane Handbook for assessing risk of bias in non-randomized studies of interventions, the Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool can be complex and difficult to use effectively for reviewers lacking specific training or expertise in its application. Previous posts have summarized research examining the reliability of ROBINS-I, suggesting that it can improve with training of reviewers. Now, a study from Igelström and colleagues finds that the tool is commonly modified or used incorrectly, potentially affecting the certainty of evidence or strength of recommendations resulting from synthesis of these studies.

The authors reviewed 124 systematic reviews published across two months in 2020, using A MeaSurement Tool to Assess systematic Reviews (AMSTAR) to operationalize the overall quality of the reviews. The authors extracted data related to the use of ROBINS-I to assess risk of bias across studies and/or outcomes as well as the number of studies included, whether meta-analysis was performed, and whether any funding sources were declared. They then assessed whether the application of ROBIN-I was predicted by the review's overall methodological quality (as measured by AMSTAR), the performance of risk of bias assessment in duplicate, the presence of industry funding, or the inclusion of randomized controlled trials in the review.


Overall methodological quality across the reviews was generally low to very low, with only 17% scoring as moderate quality and 6% scoring as high quality. Only six (5%) of the reviews reported explicit justifications for risk of bias judgments both across and within domains. Modification of ROBINS-I was common, with 20% of reviews modifying the rating scale, and six either not reporting across all seven domains or adding an eight domain. In 19% of reviews, studies rated as having a "critical" risk of bias were included in the narrative or quantitative synthesis, against guidance for the use of the tool. 

Reviews that were of higher quality as assessed by AMSTAR tended to contain fewer "low" or "moderate" risk of bias ratings and more judgments of "critical" risk of bias. Thus, the authors argue, incorrect or modified use of ROBINS-I may risk underestimating the potential risk of bias among included studies, potentially affecting the resulting conclusions or recommendations. Associations between the use of ROBINS-I and the other potential predictors, however, were less conclusive. 

Igelström, E., Campbell, M., Craig, P., and Katikireddi, S.V. (2021). Cochrane's risk-of-bias tool for non-randomized studies (ROBINS-I) is frequently misapplied: A methodological systematic review. J Clin Epidmiol, in-press.

Manuscript available from publisher's website here. 










Tuesday, August 24, 2021

UpPriority: A new tool to guide the prioritization of guideline update efforts

The establishment of a process for assessing the need to update a clinical guideline based on new information and evidence is a key aspect of guideline quality. However, given limited time and resources, it is likely necessary to prioritize clinical questions that are most in need of an update from year to year. A new paper demonstrates proof of concept for the UpPriority Tool, which aims to allow guideline developers to prioritize questions for guideline update. 

The tool comprises six different items when assessing the need to update a given recommendation or topic of guideline:
  • the potential impact of an outdated guideline on patient safety;
  • the availability of new, relevant evidence;
  • the context relevance of the clinical question at hand (is the question still relevant given considerations such as the burden of disease, variation in practice, or emerging care options?);
  • methodological applicability of the clinical question (does the question still address PICO components of interest?);
  • user interest in an update; and
  • the potential impact of an update on access to health care.
To apply this tool in a real-world setting, the authors took a sample of four guidelines published by the Spanish National Health System (NHS) within the past 2-3 years and which utilized the GRADE framework. A survey was then developed in order to assess the above six items, calculate a priority ranking, and from there, decide which questions were in highest need of updating. The survey was disseminated among members of a working group comprising members of the original guideline and additional content experts. Additional factors for consideration included the volume of new evidence, the availability of resources, and the need to include new clinical questions. 




Through this process, a total of 16 (15%) of the 107 questions were defined as high priority for updating.  Of these, 12 were given a score higher than five for one of the individual items (specifically the item assessing an impact on patient safety), while the remaining four received an overall score higher than 30 across all six items.

In addition to the priority ranking derived from the six assessment items, the survey also assessed the usability and inter-observer reliability of the tool itself. The reliability (intra-class correlation) ranged from good in one guideline (0.87) to moderate (0.62 and 0.63) in two guidelines and poor (0.15) in one. The authors conclude that the identification and proper training of content experts to serve as appraisers remains the key challenge for the efficacious application of this tool.

Sanabria, A.J., Alonso-Coelle, P., McFarlane, E., et al. (2021). The UpPriority tool supported prioritization processes for updating clinical guideline questions. J Clin Epidemiol (in-press).

The manuscript can be accessed here.

















Wednesday, August 4, 2021

Correction to guidance for assessing imprecision with continuous outcomes

Systematic review and guideline developers take note: the authors of the 2011 guidance on assessing imprecision within the GRADE framework have recently issued a correction related to the assessment of information size when evaluating a continuous outcome.


Whereas the article stated originally that a sample size of approximately 400 (200 per group) would be required to detect an effect size of 0.2 standard deviations assuming an alpha of 0.05 and a power of 0.8, the correct number is actually 800 (400 per group). 

The full corrigendum can be read here. 

Thursday, July 29, 2021

New GRADE guidance on assessing imprecision in a network meta-analysis

Imprecision is one of the major domains of the GRADE framework and is used to assess whether to rate down the certainty of evidence related to an outcome of interest. In a traditional ("pairwise") meta-analysis which compares two intervention groups, exposures, or tests against one another, two considerations are made: the confidence interval around the absolute estimate of effect, and the optimal information size (OIS). If the bounds of the confidence interval cross a threshold for a meaningful effect, and/or if optimal information size given the sample size in the meta-analysis is not met, then one should consider rating down for imprecision.

In the context of small sample sizes, confidence intervals around an effect may be fragile - meaning they could be changed substantially with additional information. Therefore, the consideration of OIS along with the bounds of the confidence interval helps address this concern when rating the certainty of evidence to develop a clinical recommendation. This is typically done by assessing whether the sample size of the meta-analysis meets that determined by a traditional power analysis for a given effect size.

However, in a network meta-analysis, both direct and indirect comparisons are made across various interventions or tests. Thus, especially if the inclusion of indirect comparisons changes the overall estimate of effect, considering only the sample size involved in the direct comparisons would be misleading. 


A new GRADE guidance paper lays out how to assess imprecision in the context of a network meta-analysis:

  • If the 95% confidence interval crosses a decision-making threshold, rate down for imprecision. Thresholds should be ideally set a priori. It may be considered to rate down by two or even three levels depending on the degree of imprecision and the resulting communication of the certainty of evidence. For example, if imprecision is the only concern for an outcome, rating down by two instead of one level would be the difference between saying that a certain intervention or test "likely" or "probably" increases or decreases a given outcome, versus whether it simply "may" have this effect.
  • If the 95% confidence interval does not cross a decision-making threshold, consider whether the effect size may be inflated. If a point estimate is far away enough from a threshold, even a relatively wide CI may not cross it. Further, relatively large effect sizes from smaller pools of evidence can be reduced with future research. 
    • In the case of a large effect size, consider whether OIS is met. If the number of patients contributing to a NMA does not meet this number, consider rating down by one, two, or three levels depending on the severity of the width of the CI. 
    • If the upper-limit of a confidence interval using relative risk is 3 or more times higher than the lower-limit, OIS has likely not been met. Similarly, upper-to-lower-limit comparisons of odds ratios exceeding 2.5 have likely not met OIS.
  • Alternatively, when the effect size is both modest, plausible, and does not cross a threshold, one likely does not need to rate down for imprecision. 
  • Avoid "double dinging" for imprecision if this limitation has already been addressed by rating down elsewhere.

Brignardello-Peterson R, Guyatt GH, Mustafa RA, et al. (2021). GRADE guidelines 33. Addressing imprecision in a network meta-analysis. J Clin Epidemiol (in-press). 

Manuscript available at the publisher's website here.





Friday, July 16, 2021

New GRADE concept paper identifies challenges and solutions to use of GRADE in public health contexts

The GRADE framework can be applied across a variety of different fields, not the least of which is public health. Public health, as the authors of a new GRADE concept paper define it, is concerned with "preventing disease, prolonging life, and promoting health through the organized efforts of society" and comprises three key domains: health protection, health services, and health improvement. However, the field of public health also has unique challenges in the application of GRADE that require addressing. 

To dig deeper into these challenges and design a plan of action for solutions and guidance, the GRADE Public Health group conducted a scoping review to better understand published accounts of the barriers, challenges, and facilitators to the adoption and application of GRADE in public health contexts, presenting the results of nine identified articles. Of these, five major challenges were identified:

  • Incorporating diverse perspectives 
  • Selecting and prioritizing outcomes
  • Interpreting outcomes and identifying a threshold for decision-making
  • Assessing certainty of evidence from diverse sources (e.g., nonrandomized studies)
  • Addressing implications for decision-makers, including concerns about conditional recommendations
The article then discusses proposed solutions and a work plan to address these key challenges.


Forthcoming GRADE public health guidance articles, collaborations with the GRADE Evidence-to-Decision working group, and the adaptation of GRADE training materials to nonhealth and policy audiences will help guide those in public health contexts in meeting the unique needs presented for rigorous guideline development. Additional promotion of existing GRADE guidance, such as the consideration of equity in the evidence-to-decision process, may help guideline developers within specific challenges related to selecting and prioritizing outcomes or identifying thresholds for decision-making. Ongoing guidance from the GRADE group for Non-Randomizes Studies and the use of ROBINS-I may further improve the application of GRADE in settings where observational evidence is dominant. 

Hilton Boon, M., Thomson, H., Shaw, B., et al. (2021). Challenges in applying the GRADE approach in public health guidelines and systematic reviews: A concept article from the GRADE public health group. J Clin Epidemol 135:42-53.

Article available at the publisher's website here










 

Friday, June 25, 2021

Scholars at 14th GRADE Workshop Discuss the Unique Challenges of Sparse Evidence, Guideline Collaborations, and Financial Incentives in Healthcare

During the 14th GRADE Guideline Development Workshop held virtually last month, the Evidence Foundation had the pleasure of welcoming three new scholars with the opportunity to attend the workshop free of charge. As part of the scholarship, each recipient presented to the workshop attendees about their current or proposed project related to evidence-based medicine and reducing bias in healthcare.

This spring's lot of three scholars was nothing short of incredibly impressive. Ifeoluwa Babatunde, a PhD student in clinical research at Case Western Reserve University, discussed the unique challenges of developing a guideline on the management of patients undergoing patent foramen ovale (PFO) closure for the Society for Cardiovascular Angiography and Interventions (SCAI). The synthesis of evidence for this question is hampered by controversies and limited evidence as well as complications due to comorbidities and age differences in the populations of interest. Babatunde discussed her interest in attending the workshop to learn more about the appropriate use of observational and indirect evidence to better answer questions related to PFO closure.

"The GRADE workshop helped me to see systematic review methodology from a deeper and more critical perspective," said Babatunde. "GRADE offers a very comprehensive yet succinct and transparent framework for developing and ascertaining the certainty of evidence in guidelines. Hence I feel better equipped to tackle challenges that arise from creating reviews and guidelines regarding conditions and populations with sparse RCTs."


Next, Dr. Pichamol Jirapinyo, the Director of Bariatric Endoscopy Fellowship at Brigham and Women's Hospital and instructor at Harvard Medical School, discussed her work on an international joint guideline development effort between the American Society for Gastrointestinal Endoscopy (ASGE) and the European Society of Gastrointestinal Endoscopy (ESGE) to produce recommendations for endoscopic and bariatric metabolic therapy (EBMT) in patients with obesity. EBMT is one of several possible management routes for obesity, alongside pharmacological and surgical options. The project will aim to answer several questions, including how patients should be managed before and after EBMT, and regarding the safety and efficacy of both gastric and small bowel EBMT.

“The GRADE workshop provided me a great framework on how to apply GRADE methodology to systematic review and meta-analysis to rigorously develop a guideline," said Dr. Jirapinyo. 'In addition to learning about the GRADE methodology itself, I found the workshop to be tremendously helpful with providing practical tips on how to run a guideline task force successfully and efficiently.”  

Finally, Dr. Lillian Lai, a research fellow in the Department of Urology at the University of Michigan, presented an intriguing discussion of financial incentives in clinical decision-making in urology. The surveillance and management of localized prostate cancer, for instance, has several different options ranging from active surveillance (which is less costly) to prostatectomy (which is more costly). Regardless of the reported health outcomes of these approaches, there is little financial incentive to conduct surveillance as opposed to surgery. The project's goal is to use health services research methods to understand how urologists response to large financial incentives, and then create financial incentives and remove financial disincentives for the promotion of guideline-concordant practices. 

"I gained invaluable knowledge on how to use the GRADE approach to rate the certainty of evidence and strength of recommendations," said Dr. Lai. "Going through the guideline development tool with experts in small groups was particularly useful for me to understand what a guideline recommendation means and entails. This workshop came at a critical time in the backdrop of COVID, and the ever-changing landscape of medicine where patients and providers need to make timely and informed decisions together."

If you are interested in learning more about GRADE and attending the workshop as a scholarship recipient, applications for our upcoming virtual workshop in October are now open. The deadline to apply is July 31, 2021. Details can be found here. 

Wednesday, June 9, 2021

Evidence Foundation scholar spotlight: Georgios Schoretsanitis

Last fall, Dr. Georgios Schoretsanitis attended the 13th (and first-ever virtual) GRADE guideline development workshop as a scholar of the Evidence Foundation. As such, he presented to the rest of the workshop attendees on his work developing guidelines for therapeutic drug monitoring to optimize and tailor treatment for psychotherapeutic medications. Beginning in 2017, a series of recommendations for reference ranges for two commonly prescribed antipsychotic medications was developed, followed this year by an international joint consensus statement on blood levels to optimize antipsychotic treatment in clinical practice.

Dr. Schoretsanitis now has an exciting update on his project.

"My main research interest is therapeutic drug monitoring, also known as TDM, which refers to the quantification and interpretation of medication levels in the blood (plasma or serum) of the patient treated with psychotropic agents," says Dr. Schoretsanitis. "The aim of TDM in clinical practice is to improve treatment response and safety outcomes. Apart from analyzing TDM clinical routine data, I have also been working as a member of the TDM taskforce of the German Association of Neuropsychopharmacology and Phaarmacopsychiatry (Arbeitsgemeinschaft für Neuropsychopharmakologie und Pharmacopsychiatrie; AGNP) involved in systematic reviews of TDM literature, which provide so-called therapeutic reference ranges for medication levels. These ranges may orient clinicians during dose selection. Attending the virtual GRADE workshop in October 2020 provided me much of inspiration, but also knowledge of well-established methodological tools for the assessment of quality of evidence.

Hereafter, in the TDM task force of AGNP, we adopted a GRADE-oriented approach in assessing TDM literature as we are reviewing new TDM evidence on commonly prescribed antipsychotics under the supervision of Prof. Gerhard Gründer, Department of Molecular Neuroimaging, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany. This type of approach is more standardized and follows GRADE guidelines. Ultimately, this work will enhance methodological rigidity for the next Consensus guidelines for therapeutic drug monitoring in neuropsychopharmacology [last update 2018; Hiemke et al, Pharmacopsychiatry]. I strongly encourage researchers involved in systematic reviews or assessment of evidence quality to attend the GRADE workshop which enables a major upgrade of related skills and knowledge."



Stay tuned for future updates from other past Evidence Foundation scholars like Dr. Schoretsanitis and the exciting work they are doing to improve the application of GRADE methodology and evidence-based medicine.

If you are interested in learning more about GRADE and attending the workshop as a scholarship recipient, applications for our upcoming virtual workshop in October are now open. The deadline to apply is July 31, 2021. Details can be found here. 


Thursday, June 3, 2021

The Systematic Survey Behind a Collection of Minimal Important Differences (MIDs) Across the Patient-Reported Outcome Literature

Patient-reported outcome measures, or PROMs, allow clinicians and researchers to directly elicit information about treatments that are important to patients, such as side effects or improvements in pain, function, or quality of life. In order to interpret changes in PROMs relative to a clinical recommendation, however, a minimal important difference (MID) - or the smallest possible change in the outcome that would mandate a change in the patient's management - must be determined.

Luckily, a wealth of published studies exist to provide a library of MIDs for a wide range of outcomes, and recently, a review published by Carrasco-Labra and colleagues synthesized these works together. The study included any empirical reports of MIDs in adolescents or adults that used an anchor-based approach, in which MIDs are based on an observed change related to an external criterion rather than the distribution of a particular patient sample. As such, anchor-based MIDs tend to be more directly applicable across patient populations. Ultimately, a collection of 585 studies reporting on 5,324 MID estimates across 526 distinct PROMs was presented.

About two-thirds (66%) collected MIDs related to patients' improvement, whereas about one-third (31%) addressed MIDs related to worsening or assumed the MIDs for improvement or worsening would be the same. Most (88%) were based on a longitudinal design in which patients' reported outcomes and satisfaction were measured at multiple timepoints. The most common types of anchors used were global ratings of change (59%), change in disease-related outcome (23%), and comparison with another group (11%), whereas the most common sources of anchor information were self-report (83%) proxy-reported (9%) and laboratory data (3%). 


MIDs are essential in interpreting the magnitude of an effect from a study or systematic review of evidence, especially when assessing imprecision as part of GRADE. They can also allow researchers to conduct "responder analyses" based on subsets of patients who experience a change in an outcome beyond a given MID. Finally, reporting mean differences in units of MIDs as part of a systematic review can standardize interpretation of an effect size in a way that may be less problematic to interpret than a traditional standardized mean difference (SMD).

The work corresponds to PROMID, a project to develop an inventory of MIDs across the literature, which can be accessed at https://promid.mcmaster.ca/. 



Carrasco-Labra A, Devji T, Qasim A, et al. (2021). Minimal important difference estimates for patient-reported outcomes: A systematic survey. J Clin Epidemiol 133:61-71.

Manuscript available at the publisher's website here. 










Friday, May 14, 2021

Reliability of Risk of Bias Assessments of Non-randomized Studies Improves After Customized Training

We previously reported on a paper published in 2020 assessing the inter-rater reliability (IRR) and inter-consensus reliability (ICR) of the Risk of Bias in Non-Randomized Studies of Interventions (ROBINS-I) tool, developed in 2016, and the Risk of Bias instrument for NRS of Exposures (ROB-NRSE) tool, developed in 2018. This paper found that reliability generally tended to be poor for these tools, while risk of bias assessments took evaluators, on average, 48 minutes for the ROBINS-I tool and almost 37 minutes for the ROB-NRSE.

Now, a new publication from the same group has examined the effect of training on the reliability of these tools. An international team of reviewers with a median of 5 years of experience with risk of bias assessment first applied the ROBINS-I and ROB-NRSE tools to a list of 44 non-randomized studies of interventions and exposures, respectively, using only the 53 pages of publicly available guidance. Then, the reviewers received an abridged and customized training document which was tailored specifically to the topic area of the reviews, included simplified guidance for assessing risk of bias, and also provided additional guidance related to more advanced concepts. The reviewers then re-assessed the studies' risk of bias after a several-weeks-long wash-out period.



Changes in the inter-rater reliability (IRR) for the ROBINS-I (top) and ROB-NRSE tools (bottom) from before and after a customized training intervention.


The training intervention improved the IRR of the ROBINS-I tool, generally improving the range of within-domain reliability while the reliability of the overall bias rating improved from "poor" to "fair." Meanwhile, the ICR improved substantially, with the overall rating's reliability improving from "poor" to "near perfect." Improvements were also observed after training in the application of the ROB-NRSE tool, with IRR of the overall bias improving significantly from "slight" to "near perfect" while its ICR improved from "poor" to "near perfect." For both tools, the pre-to-post-intervention correlations between reviewers' scores were poor, suggesting that the training did have an impact on these measures independent of a simple learning effect. While customized training was associated with a decrease in evaluator burden for the ROBINS-I tool, this did not hold true for the ROB-NRSE.

The findings of this analysis suggest that the use of a customized, shortened guidance tool specifically tailored to the topical content of a review, including simplified guidance for decision-making within each domain, can improve the reliability of resulting risk of bias assessments. The authors suggest that future reviewers create such guidance based on the specific needs and considerations of their topic area, and publish these tools along with the review.

Jeyaraman MM, Robson RC, Copstein L et al. (2021). Customized guidance/training improved the psychometric properties of methodologically rigorous risk of bias instruments for non-randomized studies. J Clin Epidemiol, in-press.

Manuscript available here. 































Tuesday, May 4, 2021

Restricting Systematic Search to English-only is a Viable Shortcut in Most, but Perhaps Not All Topics in Medicine

In the limitations sections of systematic reviews on any topic, it is not uncommon for the authors to discuss how language limitations within their search may have restricted the breadth of evidence presented. For instance, if the reviewers speak only English, the review is likely limited to publications and journals in that language. But how much of a difference does such a limitation make in terms of the overall conclusions of a systematic review? According to a new paper in the Journal of Clinical Epidemiology, probably not much - but it may depend on the specific topic of medicine under investigation.

While other methods reviews have previously examined this question, Dobrescu and colleagues extended the range of topics to methods reviews that included systematic reviews within the realm of complementary and alternative medicine, yielding four reviews previously unexamined by prior studies. Specifically, the authors looked for methods reviews comparing the restriction of literature searches to English-only versus unrestricted searches and whose primary outcomes compared differences in treatment effect estimates, certainty of evidence ratings, or conclusions based on the language restrictions enforced. 

The search yielded eight studies investigating the impact of language restrictions in anywhere from 9 to 147 systematic reviews in medicine. Overall, the exclusion of non-English articles had a greater impact on estimates of treatment effects and the statistical significance of findings in reviews of complementary and alternative medicine versus conventional medicine topics. Most commonly, the exclusion of non-English studies led to a loss of statistical significance in these topic areas.

Overall, the methods studies examined found that the exclusion of non-English studies of conventional medicine topics led to small to moderate changes in the estimate of effect; however, exclusion of non-English studies shrank the observed effect size in complementary and alternative medicine topics by 63 percent. Two studies examined whether language restricted influenced authors' overall conclusions, generally finding no effect.

The figure above shows the frequency of languages of the excluded reviews examined.

The authors conclude that when it comes to systematic reviews of conventional medicine topics, their findings are in line with those of previous methods studies which demonstrate little to no effect of language restrictions and suggest that restricting a search to English-only should not greatly impact the findings or conclusions of a review. However, the effect appears greater in the realm of complementary and alternative medicine, perhaps due to the greater proportion of non-English studies published in this field. Thus, systematic reviewers attempting to synthesize the evidence on an alternative medicine topic should be cognizant of their choices regarding language restriction and the potential implications they may have on their ultimate findings.

Dobrescu A, Nussbaumer SB, Klerings I et al. (2021). Restricting evidence syntheses of interventions to English-language publications is a viable methodological shortcut for most medical topics: A systematic review: Excluding English-language publications a valid shortcut. J Clin Epidemiol, epub ahead of print.

Manuscript available from publisher's website here. 


















Wednesday, April 21, 2021

In Studies of Patients at High Risk of Death, More Explicit Reporting of Functional Outcomes is Needed

Randomized controlled trials examining the effects of an intervention in patients with a high risk of death will often also include functional outcomes - such as quality of life, cognition, or physical disability. However, the death of patients before these outcomes can be assessed (also known as "truncation due to death") can confound the results of a "survivors-only" analysis, especially if mortality rates are higher in certain groups than others. 

A new methodology review of studies published within 5 high-impact general medical journals from 2014 to 2019 provides insight into this phenomenon and suggestions for improving how functional outcomes are handled. To be eligible for the review, a study needed to be a randomized controlled trial (RCT) with a mortality rate of at least 10% in one arm and to report at least one functional outcome in addition to mortality. The authors recorded the outcomes analyzed, the type of statistical analyses used, and the sample population of each of the 434 included studies. For most (351, or 79%) of these, function was a secondary outcome, while it was a primary outcome for 91 (21%) of them.

Only one-quarter (25%) of the functional outcomes within the studies that examined them as secondary outcomes used an approach that included all randomized patients (intention-to-treat); for the studies for which functional outcomes were the primary outcomes analyzed, this proportion was 60%.


The authors provide suggestions for best ways to handle and report data in these studies:
  • In the methods rather than only in tables or supplementary material, explicitly state the sample population from which the functional outcomes were drawn, whether it's survivors-only or another type of analysis.
  • If a survivors-only analysis is used, the authors should report the baseline characteristics between the groups analyzed and transparently discuss this as a limitation within the discussion section.
  • If all randomized participants are analyzed regardless of mortality, authors should report the assumptions upon which these analyses are based; for instance, if death is one outcome ranked among others in a worst-rank analysis, the justification for the ranking of outcomes should be discussed in the methods, and the implications of these decisions included in the discussion section. 
Colantuoni E, Li X, Hashem MD et al. (2021). A structured methodology review showed analyses of functional outcomes are frequently limited to "survivors only" in trials enrolling patients at high risk of death. J Clin Epidemiol (e-pub ahead of print).

Manuscript available here.

Thursday, April 8, 2021

Digging Deeper: 5 Ways to Help Guide Decision-Making When Research Evidence is "Insufficient"

A key tenet underlying the GRADE framework is that the certainty of available research evidence is a key factor to be considered in the course of clinical decision-making. But what if little to no published research exists off of which to base a recommendation? At the end of the day, clinicians, patients, policymakers, and others will still need to make a decision, and will look to a guideline for direction. Thankfully, there are other options to pursue within the context of a systematic review or guideline that ensures that as much of the available evidence is presented as possible, although it may be from less traditional or direct sources.

A new project conducted by the Evidence-based Practice Center (EPC) Program of the Agency for Healthcare Research and Quality (AHRQ) developed guidance for supplementing a review of evidence when the available research evidence is sparse or insufficient. This guidance was based on a three-pronged approach, including:

  • a literature review of articles that have defined and dealt with insufficient evidence, 
  • a convenience sample of recent systematic reviews conducted by EPCs that included at least one outcome for which the evidence was rated as insufficient, and
  • an audit of technical briefs from the EPCs, which tend to be developed when a given topic is expected to yield little to no published evidence and which often contain supplementary sources of information such as grey literature and expert interviews.
Through this approach, the workgroup identified five key strategies for dealing with the challenge of insufficient evidence:
  1. Reconsider eligible study designs: broaden your search to capture a wider variety of published evidence, such as cohort or case studies.
  2. Summarize evidence outside the prespecified review parameters: use indirect evidence that does not perfectly match the PICO of your topic in order to better contextualize the decision being presented.
  3. Summarize evidence on contextual factors (factors other than benefits/harms): these include key aspects of the GRADE Evidence-to-Decision framework, such as patient values and preferences and the acceptability, feasibility, and cost-effectiveness of a given intervention.
  4. Consider modeling if appropriate, and if expertise is available: if possible, certain types of modeling can help fill in the gaps and make useful predictions for outcomes in lieu of real-life research.
  5. Incorporate health system data: "real-world" evidence such as electronic health records and registries can supplement more mechanistic or explanatory RCTs.



Some of these challenges can be more efficiently addressed up-front, before the scoping of a new review even begins. For instance, identifying topic experts and stakeholders who are familiar with the quantity and quality of available evidence can help a group foresee potential gaps and plan for the need to broaden the scope. Care should be taken to identify the outcomes that are of critical importance to patients, and through this lens, develop strategies and criteria within the protocol that will best meet the needs of the review while tapping into as much evidence as possible. Finally, researchers should avoid using the term "insufficient" when describing the evidence, and instead explicitly state that no eligible studies or types of evidence were available.

Murad MH, Chang SM, Fiordalisi CV, et al. (2021). Improving the utility of evidence synthesis for decisionmakers in the face of insufficient evidence. J Clin Epidemiol, ahead-of-print. 

Manuscript available from publisher's website here.

















Friday, April 2, 2021

New Review of Pragmatic Trials Reveals Insights, Identifies Gaps

As opposed to an "explanatory" or "mechanistic" randomized controlled trial (RCT), which seeks to examine the effect of an intervention under tightly controlled circumstances, "pragmatic" or "naturalistic" trials study interventions and their outcomes when used in more real-world, generalizable settings. One example of such a study might include the use of registry data to examine interventions and outcomes as they occur in the "real world" of patient care. However, there are currently few standards for identifying, reporting, and discussing the results of such "pragmatic RCTs." A new paper by Nicholls and colleagues aims to provide an overview of the current landscape of this methodological genre.

The authors searched for and synthesized 4,337 trials using keywords such as "pragmatic," "real world," "registry based," and "comparative effectiveness" to better map an understanding of how pragmatic trials are presented in the RCT literature. Overall, only about 22% (964) of these trials were identified as "pragmatic" RCTs in the title, abstract, or full text; about half of these (55%) used this term in the title or abstract, while the remaining 45% described the work as a pragmatic trial only in the full text. 

About 78.1% (3,368) of the trials indicated that they were registered. However, only about 6% were indexed in PubMed as a pragmatic trial, and only 0.5% were labeled with the MeSH topic of Pragmatic Clinical Trial. The target enrollment of pragmatic trials was a median of 440 participants within an interquartile range (IQR) of 244 to 1,200; the actual achieved accrual was 414 (IQR: 216 - 1,147). The largest trial included 933,789 participants; the smallest enrolled 60.

Overall, pragmatic trials were more likely to be centered in North America and Europe and to be funded by non-industry sources. Behavioral, rather than drug or device-based, interventions were most common in these trials. Not infrequently, the trials were mislabeled or contained erroneous data in their registration information. The fact that only about half of the sample were clearly labeled as "pragmatic" may mean that these trials may go undetected with less sensitive search mechanisms than the authors used.

Authors of pragmatic trials can improve the quality of the field by clearly labelling their work as such and by registering their trials and ensuring that registered data are accurate and up-to-date. The authors also suggest that taking a broader view of what constitutes a "pragmatic RCT" also generates questions regarding proper ethical standards when research is conducted on a large scale with multiple lines of responsibility. Finally, the mechanisms used to obtain consent in these trials should be further examined in light of the finding that many pragmatic trials fail to achieve goals set for participant enrollment.

Manuscript available from publisher's web site here. 

Nicholls SG, Carroll K, Hey SP, et al. (2021). A review of pragmatic trials found a high degree of diversity in design and scope, deficiencies in reporting and trial registry data, and poor indexing. J Clin Epidemiol (ahead of print). 

















Monday, March 15, 2021

A Blinding Success?: The Debate over Reporting the Success of Blinding

While the use of blinding is a hallmark of placebo-controlled trials, whether the blinding was successful - i.e., whether or not participants were able to figure out the treatment condition to which they have been assigned - isn't always tested, nor are the results of these tests always reported. The measurement of the success of blinding in trials is controversial and not uniformly used, and the item has been dropped from subsequent versions of the CONSORT reporting items for trials. According to a recent discussion of the pros and cons to measuring the success of blinding, only between 2-24% of trials perform or report these types of tests.

As Webster and colleagues explain, the benefits to measuring the success of blinding are as follows:

  • the success (or failure) of blinding in a placebo-controlled trial can introduce a source of bias that affects the results. 
  • while the effect of blinding itself may be small, these small effects could still result in changes to policy or practice
  • there are documented instances in which the failure to properly blind (for instance, providing participants with a sour-tasting Vitamin C condition versus a sweet lactose "placebo") led to an observed effect (for instance, on preventing or treating the common cold) whereas there was no effect in the subgroup of participants who were successfully blinded.
Reasons commonly given against the testing of successful blinding include the following:
  • At times, a break in blinding can lead to conclusions in the opposite direction. For instance, physicians who are unblinded may assume that the patients with better outcomes received a drug widely supposed to be "superior," when in fact, the opposite occurred.
  • In some cases, a treatment with dramatically superior results can result in unblinding, even when the treatment conditions were identical - but that doesn't necessarily mean the blinding was a failure or could have been prevented, given the dramatic differences in outcomes.
  • If the measurement of blinding is performed at the wrong time - such as before the completion of the trial - participants may become suspicious and this in itself could potentially confound treatment effects.


Webster RK, Bishop F, Collins GS, et al. (2021). Measuring the success of blinding in placebo-controlled trials: Should we be so quick to dismiss it? J Clin Epidemiol, pre-print.

Manuscript available from publisher's website here.




























Tuesday, March 9, 2021

Expert Evidence: A Framework for Using GRADE When "No" Evidence Exists

To guide the formulation of clinical recommendations, GRADE relies on the use of direct or, if necessary, indirect evidence from peer-reviewed publications as well as the gray literature. However, in some cases, no such evidence may be found even after an extensive search has been conducted. A new paper - part of the informal GRADE Notes series in the Journal of Clinical Epidemiology - relays the results of piloting an "expert evidence" approach and provides key suggestions when using it.

As opposed to simply asking the panel members of a guideline to base their recommendations off of informal opinion, the expert evidence approach systematizes this process by eliciting the extent of their experience with certain clinical scenarios through quantitative survey methods. In this example, at least 50% of the panel members were free of conflicts of interest, with various countries and specialties represented. While members were not required to base their answers off of patient charts, the authors suggest that this can be used to further increase the rigor of the survey. 



As a result of the survey, the recommendations put forward reflected a cumulative 12,000 cases of experience. Because the members felt that at least some recommendation was necessary to help guide care - where the alternative would be to provide no recommendation at all - the guideline helped to fill a gap while indicating the current lack of high-quality published evidence for several clinical questions, which may help guide the production of higher-quality evidence and recommendations in the future. Importantly, by utilizing a survey approach to facilitate the formulation of recommendations, the authors note that it avoided the pitfall of "consensus-based" approaches to guideline development which can often manifest as simply reflecting the opinions of those with the loudest voices. 

Mustafa RA, Cuello Garcia CA, Bhatt M, Riva JJ, Vesely S, Wiercioch W, ... & HJ Schünemann. (2021). How to use GRADE when there is "no" evidence? A case study of the expert evidence approach. J Clin Epidemiol, in-press. 

Manuscript available from the publisher's website here


Wednesday, March 3, 2021

Dealing with Zero-Events Studies in Meta-analysis: There's a Better Way than Throwing it Away!

When meta-analyzing data from studies examining the incidence of rare events - or those with a small sample size or short follow-up period, it is not uncommon to come across a study with 0 events of the outcome of interest. In fact, approximately one-third of a random sample of 500 Cochrane reviews contained at least one zero-events study.

Zero-events studies are typically categorized as single-arm (there are 0 events reported in just one group) or double-arm (there are 0 events reported in both groups). While some software automatically discard double-arm zero-events studies from a meta-analysis, this is not ideal because these data still add useful information in regards to the overall effect of an intervention. Ideally, meta-analyses could include a pooled event count that may be zero in one arm, both arms, or neither, with various single-arm and double-arm zero-events studies potentially contributing to this final effect. Thus, in a recently published article, Xu and colleagues propose a more detailed framework for approaching zero-events studies in the context of a meta-analysis. 

The authors describe six classifications as follows, with the degree of difficulty when meta-analyzing generally increasing from 1 to 6:

1) MA-SZ: meta-analysis contains zero-events only occurring in single arms, no double-arm-zero-events studies are included, and the total events count in neither arm is zero;

2) MA-MZ: meta-analysis contains zero-events occurring in both single and double arms, and the total events count in neither arm is zero;

3) MA-DZ: meta-analysis contains zero-events only occurring in double arms, and the total events count in neither arm is zero;

4) MA-CSZ: meta-analysis contains zero-events occurring in single arms, and no double-arm-zero-events studies are included, while the total events count in one of the arms is zero;

5) MA-CMZ: meta-analysis contains zero-events occurring in both single arm and double arms, while the total events count in one of the arms is zero;

6) MA-CDZ: meta-analysis only includes double-arm-zero-events studies, while the total events count in both arms are zero


The authors examined data from the Cochrane Database of Systematic Reviews (CDSR), including any review published between January 2003 - May 2018 and meta-analyzing at least two studies. Of the 61,090 reviews identified with binary outcomes, 21,288 (34.85%) contained at least one zero-events study. In a great majority (90.7%) of these, the total event count was greater than zero for both arms and the meta-analysis only included single-arm rather than double-arm zero-events studies. Second most common (6.21%) was the MA-CSZ, in which the total event count includes one arm with zero events, and the zero-events studies included are only single-arm. All others of the four remaining categories each made up less than 1.5% of the whole.
The authors propose that those looking to meta-analyze studies that include zero events first categorize their specific subtype, and then work through one of the suggested methods in the figure below. Finally, a sensitivity analysis should be used following an alternative method to determine the robustness of the results.


Xu C, Furuya-Kanamori L, Zorzela L, Lin L, and Vohra S. (2021). A proposed framework to guide evidence synthesis practice for meta-analysis with zero-events studies. J Clin Epidemiol, in-press.
Manuscript available from the publisher's website here









Thursday, February 25, 2021

The Use of GRADE in Systematic Reviews of Nutrition Interventions is Still Rare, but Growing

While the GRADE framework is used by over 100 health organizations to assess the certainty of evidence and guide the formulation of clinical recommendations, its use in the field of nutrition for these purposes is still sparse. A recent examination of all systematic reviews using GRADE in the ten highest-impact nutrition journals over the past five years provides insight and suggestions for moving the field forward in the use of GRADE for evidence assessment in systematic reviews of nutritional interventions.

Werner and colleagues identified 800 eligible systematic reviews, 55 (6.9%) of which used GRADE, and 47 (5.9%) of which rated the certainty of evidence specific to different outcomes. The number of these reviews using GRADE increased year-to-year, from two in 2015 to 23 in 2019. Reviews claiming to use a modification of GRADE were excluded from analysis.

Of the 811 identified cases of downgrading the certainty of evidence, and 31 cases of upgrading. Reviews of randomized controlled trials had a mean number of 1.6 domains downgraded per outcome, while reviews of non-randomized studies had a mean of 2.1. In about 6.5% of upgrading cases, this was done for unclear purposes not in line with GRADE guidance, such as upgrading for low risk of bias, narrow confidence intervals, or very low p-values. Reviews of non-randomized studies were more likely to have outcomes downgraded for imprecision and inconsistency, and less likely to have downgrades for publication bias than those of randomized studies. 

The authors conclude that while the use of GRADE in systematic reviews of nutritional interventions has grown over recent years based on this sample, continued education and training of nutrition researchers and experts can help improve the spread and quality of the application of GRADE to assess the certainty of evidence in this discipline.

Werner SS, Binder N, Toews I, et al. (2021). The use of GRADE in evidence syntheses published in high-impact-factor nutrition journal: A methodological survey. J Clin Epidemiol, in-press.

Manuscript available here. 











Friday, February 19, 2021

Registration of Trials Included in Systematic Reviews Has Improved Over Time, but Remains Under 50% for Most Years

 The prospective registration of a randomized controlled trial (RCT) can reduce bias by clearly laying out the methods to be used before the research is conducted and data analyzed. Registration can also help limit unintentional duplication of efforts, which can be seen as an ethical charge, as duplication of research may mean duplication of unnecessary potential risks to participants and a failure to properly disseminate the findings of research. In 2004, the International Committee of Medical Journal Editors (ICMJE)  recommended that journals only consider publishing results of a trial that was prospectively registered.

A new study by Lindsley and colleagues set out to examine just how many RCTs included in a sample of systematic reviews were properly registered, and if so, whether the registration entry was updated with results of the trial. From a group of 618 systematic reviews published within the Cochrane Musculoskeletal, Oral, Skin and Sensory (MOSS) network between 2014 and 2019, between a total of 100 eligible reviews were randomly selected from each of the network's eight groups (30 from the Eyes and Vision group, which provided the pilot data, and ten from each of the remaining seven). 

Among a total of 1,432 included trials published since 2000 when the protocol repository clinicaltrials.gov became available, only 379 (26%) had been registered. Of those 1,177 trials published since 2005, when the ICMJE recommendation first went into effect, the proportion of registered trials increased to 31%, and then to 38% for those published since 2010. Registered trials had double the median number of patients (120) than non-registered trials (60). About one-third (31%) of the trials published since 2005 included at least one major outcome within the registry record.  While trial registration did seem to increase over time, only during two years - 2015 and 2018 - did the proportion of registered to nonregistered trials exceed 50%. 

Overall, the authors found that while trial registration has become more common since 2005, it still tends to make up the minority of trials included within systematic reviews of this area. In addition, only about 10% of trials examined had updated the registration record with results related to safety or efficacy. Much room for improvement remains in terms of increasing and incentivizing the prospective registration of trials and update of information with publicly available results.

Lindsley K, Fusco N, Teeuw H, et al. (2021). Poor compliance of clinical trial registration among trials included in systematic reviews: A cohort study. J Clin Epidemiol 132:79-87. 

Manuscript available here.