Thursday, March 26, 2020

Extremely Serious Research Short: GRADE’s terminology for rating down by three levels

Contributed by Madelin Siedler, 2019/2020 U.S. GRADE Network Research Fellow

Since the inception of GRADE two decades ago, GRADE methodology has needed to evolve along with the arrival of new ways of assessing the evidence. One such evolution has come with the introduction of methods for assessing risk of bias for non-randomized studies, such as the Risk Of Bias In Non-randomized Studies (ROBINS-I) and the RoB Instrument for Nonrandomized Studies of Exposures (ROBINS-E).

Because these tools assess the risk of bias in non-randomized studies as if they represent a pragmatic trial, they automatically begin from a lower risk of bias than alternative assessments such as the Newcastle-Ottowa Scale. When rating down in GRADE, however, non-randomized studies start as low certainty of evidence before any rating up or down occurs. This means that while a study assessed with ROBINS-I or E would start as high-quality evidence, it may require a reduction of three levels if very serious risk of bias is present. In other words, a reduction of three levels for a study assessed with ROBINS-I or E would be analogous to a two-level reduction for a non-randomized study assessed with another method.

A rating by any other name…

In order to determine what exactly this new three-level reduction should be called, members of the GRADE Working Group conducted a survey of 225 participants recruited via social media, the Guidelines International Network (G-I-N), and other sources. Just over one-third (34.2%) were members of the GRADE Working Group and all respondents had participated in guideline development in some capacity. The results are presented in a newly published article as part of a new “GRADE Notes” series in the Journal of Clinical Epidemiology.

Within the survey, participants were asked to rate the following terms for this novel three-level reduction, from least (1) to most-favored (4):

  • Critically serious
  • Extremely serious
  • Most serious
  • Very, very serious

Respondents' average ranking of terms. 
T. Piggott et al. / Journal of Clinical Epidemiology - (2020)

“Extremely serious” took the lead as the most favorably ranked term with an average score of 3.19, with “critically serious” a close second at 3.12. Respondents found “extremely serious” the most agreeable due to its clarity and the fact that it seemed to “naturally” follow the existing two-level term, “very serious.”

The term “extremely serious” can now be found within the GRADEpro application when rating the certainty of evidence within non-randomized studies while utilizing the ROBINS-I or ROBINS-E instruments.



Piggott T, Morgan RL, Cuello-Garcia CA, Santesso N, Mustafa RA, Meerpohl JJ, Schünemann HJ, GRADE Working Group. GRADE notes: Extremely Serious, GRADE’s Terminology for Rating Down by 3-Levels. Journal of Clinical Epidemiology. 2019 Dec 19.

Manuscript available here on publisher's site.

Tuesday, March 10, 2020

Research Shorts: U.S. Guideline Developers Inconsistently Applying Criteria for Appropriate Evidence Grading

Contributed by Philipp Dahm, MD, MHSc, FACS

Guideline Developers in the United States were Inconsistent in Applying Criteria for Appropriate GRADE Use


Our study was motivated by the anecdotal observation that many US-based organizations appeared to be endorsing the GRADE approach but did not necessarily apply it to the fullest extent. We therefore sought to formally study this issue applying six published criteria of appropriate GRADE use. We limited to search to guidelines from US-based organizations that were included in the National Guideline Clearinghouse (NGC) which implied that they met certain, minimal criteria for evidence-based guidelines. Our search reached back to January 2011 and went to June 2018 after which time the NGCH lost its funding and stopped existing in that form.

Among guidelines documents from 315 organizations included in the database, 135 were from the US and were represented by at least one guideline. Our analysis ultimately included 67 guideline documents from 44 organizations. The vast majority of these guidelines were from professional organizations; mostly related to the field of internal medicine and its subspecialties. With regard to domains for rating the certainty of evidence, only one in 10 was explicit about including all five criteria for downgrading (study limitations, indirectness, inconsistency, imprecision, and publication bias) for a body of evidence from randomized trials and all three domains (large magnitude of effect, dose-response gradient, and direction of residual bias) for rating up a body of evidence from non-randomized trials. Over half of guidelines described explicit consideration of all four central domains (certainty of evidence, balance of benefits to harms, patients’ values and preferences and resource utilization) for moving from evidence to recommendations. All guidelines included the certainty of evidence and the vast majority also addressed the balance of desirable and undesirable consequences. When comparing guidelines published in 2011-2014 versus 2015-18, rates of appropriate use were higher for nearly all criteria, but only one main criterion met statistical significance, namely the reporting of evidence summaries supporting recommendations.

The take-home messages from this study are that one-in-three US based organizations developing evidence-based guidelines report the use of GRADE but that adherence to published criteria is quite inconsistent. As GRADE finds increasing uptake worldwide, continued efforts in training guideline methodologists and panel members will be important to assure appropriate application of GRADE methodology.


Dixon C, Dixon PE, Sultan S, Mustafa R, Morgan RL, Murad MH, Falck-Ytter Y, Dahm P. Guideline Developers in the United States were Inconsistent in Applying Criteria for Appropriate GRADE Use. Journal of Clinical Epidemiology. 2020 Mar 4.

Wednesday, February 26, 2020

Evidence Foundation Scholar Update

Contributed by Janice Tufte, 2019 Evidence Foundation Scholar & Madelin Siedler, 2019/2020 U.S. GRADE Network Research Fellow

Janice Tufte, an independent consultant and a leader in patient-public partnership initiatives, received a scholarship to attend the tenth GRADE guideline development workshop held in Denver, Colorado in February 2019 (blog post here). As part of her scholarship, Tufte presented to the larger workshop group on the unique opportunities of using patient partners during the development of GRADE guidelines.


Spring 2019 scholarship recipients: Dr. Irbaz bin Riaz (L) and Ms. Janice Tufte (R)

Tufte has worked with the American College of Physicians as a public panel member on multiple guidelines in addition to serving as a panel member of their Clinical Guidelines Committee. In these roles, she has provided input on outcomes of importance and on future topics for guideline development and is listed as a co-author on recent guidelines and guidance statements. She has also presented to groups on the basics of guideline development including fellow ambassadors of the Patient-Centers Outcomes Research Institute (PCORI).

Recently, Tufte co-published a protocol for the development of guidance for multi-stakeholder engagement (MuSE) in health and healthcare guideline development and implementation. The forthcoming guidance will be based on four different systematic reviews, including an examination of the current barriers and opportunities for stakeholder involvement in the guideline development process and the effect of stakeholder engagement on resulting guidelines and their implementation. The project will ultimately provide recommendations for ways to improve stakeholder engagement.

Here, Tufte provides an update on her work to continue to promote patient involvement in guideline development, including the use of GRADEpro software for these purposes.
My first exposure to the GRADE process and subsequent utilization of the GRADEpro platform genuinely captured and kept my attention. I quickly realized how beneficial both the Evidence to Decision and Summary Tables could be to public patients involved with systematic reviews, guideline development, and recommendations. I could view the overall landscape and better comprehend the pertinent information and findings in the EtD and Summary table that I needed as a non-scientist, allowing me to contribute more effectively in a meaningful manner in a panel conversation on evidence, recommendations or judgements.

GRADEpro is a versatile and modifiable online format. We are able to create a foundation based on the specificities and needs unique to our individual question that will then display the evidence and quality syntheses in a neat table as well as a reliable summary. Having diverse stakeholders at the table is important. Encouraging all to bring in the public and patient perspective is my particular favorite niche, and I will continue to promote this no matter the topic area.

I have recently enjoyed working with the new GRADEpro extensions while serving on a guidelines panel, where we have incorporated GRADE infographics. One challenge I have discovered is synthesizing the information down to an easy understandable one-pager for both clinicians and public reading.

I am looking forward to working with GRADE in the future, with a goal to better deliver reliable, understandable evidence to all who could benefit.
Stay tuned for future updates regarding Janice’s continued work in promoting patient engagement in guideline development.

If you are interested in learning more about GRADE and attending the workshop as a scholarship recipient, applications for our upcoming workshop in Chicago this October are open. The deadline to apply is July 31, 2020. Details can be found here.

Wednesday, February 19, 2020

Research Shorts: Informative statements to communicate the findings of reviews

Contributed by Madelin Siedler, 2019/2020 U.S. GRADE Network Research Fellow

When authors of systematic reviews utilize the GRADE approach to evaluate the certainty of evidence in their findings, they should present this information in a way that is clear, consistent, and useful to the reader. In a recent article from the GRADE series (GRADE guidelines 26) in the Journal of Clinical Epidemiology, Santesso and colleagues present recommendations for communicating the effect size and certainty of evidence within a systematic review. These statements were informed by years of research, feedback, and discussion, including the qualitative input of around 100 methodology experts and a survey of 110 respondents of diverse backgrounds and levels of GRADE expertise.

The final result was a table of suggested statements organized by the certainty of the effect followed by the size of that effect based on the point estimate. In order to use this tool, systematic review authors will need to first determine thresholds for the size of the effect (i.e., whether the effect on an outcome is trivial, small, moderate, or large, or if there is no effect). This can be accomplished in “full contextualization,” in which the outcome is considered in relation to all other critical outcomes, or “partial contextualization,” in relation to the standalone value of the single outcome.

The suggested statements generated from the table can be used throughout the text of a systematic review, from the abstract to the discussion, and as part of any review type, such as those examining the accuracy of test strategies. The included language is also simple enough to be included as part of a plain language summary or other consumer-facing materials.


Santesso N, Glenton C, Dahm P, Garner P, Akl E, Alper B, Brignardello-Petersen R, Carrasco-Labra A, De Beer H, Hultcrantz M, Kuijpers T Meerpohl J, Morgan R, Mustafa R, Skoetz N, Sultan S, Wiysonge C, Guyatt G, Schünemann HJ. GRADE guidelines 26: Informative statements to communicate the findings of systematic reviews of interventions. Journal of clinical epidemiology. 2019 Nov 9.

Manuscript available here on publisher's site.

Tuesday, February 11, 2020

Don’t Sell Your Guideline Short – Remember to Report! (Part 1)

Contributed by Madelin Siedler, 2019/2020 U.S. GRADE Network Research Fellow

The development of a high-quality, evidence-based clinical guideline is no small feat. It requires significant time and effort from content experts, methodologists, and organizational staff and typically takes more than 1-2 years from start to finish.

Given the effort and hours that go into guideline development, it’s all too easy - and all too common - for the reporting of the development process of these guidelines to significantly undersell their quality. This is important, because published analyses assessing the quality of guidelines will likely only use what is reported or referenced in the text of the guideline. In other words, guidelines that do not adequately report on the methods they used to develop their recommendations will be under-appraised in the published literature – and this could lead to a gross underestimation of a guideline-developing organization’s work as a whole.

Quality and Reporting Standards: A Brief Review

Over the past decade, a number of standard sets, reporting checklists, and appraisal tools have been published to assist guideline developers in the reporting of their methods and to provide ways for researchers to assess the quality of these guidelines. These standards and methods of appraisal include but are not limited to:
  • The Appraisal of Guidelines for Research and Evaluation (AGREE) II tool (2010)
  • the National Academy of Medicine (formerly the Institute of Medicine [IOM]) Standards for Trustworthy Clinical Practice Guidelines (2011)
  • the Guideline International Network (G-I-N) Key Components of High-Quality and Trustworthy Guidelines (2012)
  • World Health Organization (WHO) Handbook for Guideline Development (2nd ed., 2014)
  • Reporting Items for practice Guidelines in HealThcare (RIGHT) Statement (2017)


Report, or it didn’t happen.

A guideline may be developed using the most water-tight, rigorous methods, but if these methods are not adequately described either in the text of the guideline or in a referenced external text, then an assessor will likely under-appraise the quality of a guideline. To ensure the most accurate appraisal of a guideline possible, guideline developers should consider the following helpful tips:
  • Create a guideline template including boilerplate text that meets as much reporting criteria as possible, such as a general description of the systematic review and recommendations development processes; competing interest statements for all involved authors and guideline panel members; a description of the method used to assess certainty of evidence and grade the strength of recommendations; and a clear table at the beginning of the document listing all clinical questions and resulting recommendations.
  • Maintain an up-to-date, in-depth description of the guideline development process on the website of the guideline-producing organization. Refer to this page specifically in the text of the guideline. This allows both guideline end-users and potential assessors to view the development process in depth without requiring too much space in the guideline document itself. 
  • When in doubt, refer it out. If there are supplemental texts to the guideline that include information related to the development process – such as an underlying systematic review or a list of authors’ conflict of interest disclosures – make sure these documents are clearly referenced in the guideline text and made easily accessible in the online version via hyperlinks. 
  • Don’t make assumptions. Even aspects of the development process that seem obvious, such as whether the guideline is externally reviewed, will likely not be included in a published quality assessment if it is not explicitly mentioned. 
  • Always be specific. Do not make the end-user of a guideline have to guess who the guideline is for, the clinical questions driving the guideline, or the appropriate scenarios in which to employ the recommendations. Utilizing the PICO (Population, Intervention, Comparison, Outcome) format to explicitly describe the clinical questions and resulting recommendations is a failsafe way to ensure your guideline is specific enough to be useful. 


Stay tuned for Part II where we provide a list of commonly overlooked items in published guidelines and discuss how to instantly improve the quality assessment of a guideline.

Monday, February 3, 2020

Research Shorts: From test accuracy to patient-important outcomes and recommendations

Contributed by Madelin Siedler, 2019/2020 U.S. GRADE Network Research Fellow

The potential risks and benefits of a screening or diagnostic testing strategy extend beyond the immediate impact and accuracy of the test itself. The result of testing will determine the available next steps and options for follow-up and management, and therefore will affect various patient-important outcomes in addition to potential resource utilization and equity considerations. These downstream consequences, and the certainty of evidence in these consequences, need to be considered when formulating recommendations surrounding testing. In a July 2019 paper published as part 22 of the Journal of Clinical Epidemiology’s GRADE guidelines series, Schünemann and colleagues provide suggestions for assessing certainty of evidence and determining recommendations for diagnostic tests and strategies.

While a collection of randomized controlled trial evidence examining the downstream consequences of various testing strategies is ideal in this scenario, such data are sparse. In lieu of this, guideline authors should develop a framework that includes each possible testing and follow-up treatment scenario, starting with the test in question and ending with patient-important outcomes.


 H.J. Schunemann et al. / Journal of Clinical Epidemiology 111 (2019) 69e82

As seen in this USPSTF sample framework, evidence begins with accuracy studies and ends with patient-important end-points.

This will allow the panel to visually link all relevant existing data together and develop clinical questions that are answerable with the evidence at hand. Data on the accuracy of a given test will help inform the expected number of false negatives and positives, which would then lead to potentially important downstream consequences - such as anxiety or a missed diagnosis - in addition to the effects of treating a diagnosed condition. The estimates of these beneficial and harmful potential outcomes should ideally come from a systematic review of evidence which can then be assessed for certainty. 

H.J. Schunemann et al. / Journal of Clinical Epidemiology 111 (2019) 69e82

The authors suggest providing one overall rating of the quality of evidence that takes into account the certainty of the diagnostic, prognostic, and management data that are available. Guideline panels should determine which outcomes of these bodies of evidence are critical and ascribe an overall rating based on the lowest level of certainty of the critical outcomes. 


Schünemann HJ, Mustafa RA, Brozek J, Santesso N, Bossuyt PM, Steingart KR, Leeflang M, Lange S, Trenti T, Langendam M, Scholten R. GRADE guidelines: 22. The GRADE approach for tests and strategies—from test accuracy to patient-important outcomes and recommendations. Journal of clinical epidemiology. 2019 Jul 1;111:69-82.

Manuscript available here on publisher's site.

Wednesday, January 22, 2020

Research Shorts: Rating the certainty in evidence in the absence of a single estimate of effect

Contributed by Madelin Siedler, 2019/2020 U.S. GRADE Network Research Fellow

When a pooled estimate from a meta-analysis of several studies is not present to guide the rating of evidence in these domains, how should one make a final determination of the certainty of evidence using GRADE? 


Evidence from a 30,000-foot view

In their 2017 paper published in Evidence-Based Medicine, Murad and colleagues describe methods for applying GRADE when bodies of evidence are either sparse or too disparate to pool. A systematic review, for instance, may only provide a narrative synthesis of the current evidence given these limitations. When a neat estimate of effect presented as part of a tidy forest plot is not available, it is necessary to use one’s best judgment to rate the domains by taking a broader view. In these cases, Murad et al. recommend the following approach:
  • Risk of Bias: Judge the risk of bias across all studies that include the outcome of interest.
  • Inconsistency: Consider the direction and size of the estimates of effect from each study. Generally, do they all tell the same story, or do they vary considerably?
  • Indirectness: Make an overall judgment about the amount of directness or indirectness of the body of evidence, given your specific question (always consider your population, intervention, outcome, and comparator[s] of interest). Generally, are the studies synthesized answering questions similar to yours? Or might the dissimilarities be enough to lower your trust in the estimate of effect as it pertains to your question?
  • Imprecision: Examine the total information size of all studies (number of events for binary outcomes, or number of participants for continuous outcomes) as well as each study’s reported confidence interval for this outcome. If there are fewer than 400 total events or participants, or if the confidence intervals from most studies - or the largest - include no effect, imprecision is likely present.
  • Publication bias: Suspect publication bias if there is a small number of only positive studies, or if data were reported in trial registries but never published.
As always, one may consider rating up the quality of evidence from an observational study if a large magnitude of effect, a dose-response gradient, or plausible residual confounding that would increase the certainty of effect are present in the majority of studies examined.


Murad MH, Mustafa RA, Schünemann HJ, Sultan S, Santesso N. Rating the certainty in evidence in the absence of a single estimate of effect. BMJ Evidence-Based Medicine. 2017 Jun 1;22(3):85-7.

Manuscript available here on publisher's site.

Monday, January 20, 2020

Research Shorts: Assessing the certainty of evidence in the importance of outcomes or values and preferences

Contributed by Madelin Siedler, 2019/2020 U.S. GRADE Network Research Fellow

The rating of outcomes in terms of their importance is a key aspect of GRADE guideline development. So is, of course, the rating of the certainty of evidence that will inform clinical decision-making. However, it is often difficult to rate the certainty of evidence of the importance of outcomes – assuming there is any evidence to draw from at all. In their July 2019 article published in the Journal of Clinical Epidemiology, Zhang and colleagues describe the ways to assess the certainty of a body of evidence used to determine the relative importance of outcomes.



The GRADE domains that present the most challenges when rating the certainty of evidence are inconsistency and imprecision. Assuming there is more than one study, assessment of inconsistency should include judging the amount of variance across studies’ reported importance of outcomes, exploring potential sources for this inconsistency (such as differences in populations or instruments used) and rating down when inconsistency is not explained by these. Imprecision should take into consideration the sample size first. In fact, in cases where there is no available quantitative synthesis, sample size may be the only consideration. In other cases, assuming information size meets a pre-defined threshold, the evidence may still be rated down if the confidence intervals of relative importance outcomes cross a pre-defined decision-making threshold.


Y. Zhang et al. (2019)/Journal of Clinical Epidemiology

The authors warn against attempts to rate the certainty of evidence in the variability of outcome importance – in other words, how much the perceived importance of any outcome varies from one individual to the next. If both inconsistency and imprecision are ruled out as potential sources of observed variance, then true variability may exist. In these cases, guideline panels should consider the formation of a conditional recommendation based on differences in values and preferences.

The article also provides guidance for assessing publication bias and rating up.


Zhang Y, Coello PA, Guyatt GH, Yepes-Nuñez JJ, Akl EA, Hazlewood G, Pardo-Hernandez H, Etxeandia-Ikobaltzeta I, Qaseem A, Williams Jr JW, Tugwell P. GRADE guidelines: 20. Assessing the certainty of evidence in the importance of outcomes or values and preferences—inconsistency, imprecision, and other domains. Journal of clinical epidemiology. 2019 Jul 1;111:83-93.

Manuscript available here on publisher's site.