Thursday, March 26, 2020

Extremely Serious Research Short: GRADE’s terminology for rating down by three levels

Contributed by Madelin Siedler, 2019/2020 U.S. GRADE Network Research Fellow

Since the inception of GRADE two decades ago, GRADE methodology has needed to evolve along with the arrival of new ways of assessing the evidence. One such evolution has come with the introduction of methods for assessing risk of bias for non-randomized studies, such as the Risk Of Bias In Non-randomized Studies (ROBINS-I) and the RoB Instrument for Nonrandomized Studies of Exposures (ROBINS-E).

Because these tools assess the risk of bias in non-randomized studies as if they represent a pragmatic trial, they automatically begin from a lower risk of bias than alternative assessments such as the Newcastle-Ottowa Scale. When rating down in GRADE, however, non-randomized studies start as low certainty of evidence before any rating up or down occurs. This means that while a study assessed with ROBINS-I or E would start as high-quality evidence, it may require a reduction of three levels if very serious risk of bias is present. In other words, a reduction of three levels for a study assessed with ROBINS-I or E would be analogous to a two-level reduction for a non-randomized study assessed with another method.

A rating by any other name…

In order to determine what exactly this new three-level reduction should be called, members of the GRADE Working Group conducted a survey of 225 participants recruited via social media, the Guidelines International Network (G-I-N), and other sources. Just over one-third (34.2%) were members of the GRADE Working Group and all respondents had participated in guideline development in some capacity. The results are presented in a newly published article as part of a new “GRADE Notes” series in the Journal of Clinical Epidemiology.

Within the survey, participants were asked to rate the following terms for this novel three-level reduction, from least (1) to most-favored (4):

  • Critically serious
  • Extremely serious
  • Most serious
  • Very, very serious


Respondents' average ranking of terms. 

T. Piggott et al. / Journal of Clinical Epidemiology - (2020)

“Extremely serious” took the lead as the most favorably ranked term with an average score of 3.19, with “critically serious” a close second at 3.12. Respondents found “extremely serious” the most agreeable due to its clarity and the fact that it seemed to “naturally” follow the existing two-level term, “very serious.”

The term “extremely serious” can now be found within the GRADEpro application when rating the certainty of evidence within non-randomized studies while utilizing the ROBINS-I or ROBINS-E instruments.



Piggott T, Morgan RL, Cuello-Garcia CA, Santesso N, Mustafa RA, Meerpohl JJ, Schünemann HJ, GRADE Working Group. GRADE notes: Extremely Serious, GRADE’s Terminology for Rating Down by 3-Levels. Journal of Clinical Epidemiology. 2019 Dec 19.

Manuscript available here on publisher's site.

Tuesday, March 10, 2020

Research Shorts: U.S. Guideline Developers Inconsistently Applying Criteria for Appropriate Evidence Grading

Contributed by Philipp Dahm, MD, MHSc, FACS

Guideline Developers in the United States were Inconsistent in Applying Criteria for Appropriate GRADE Use


Our study was motivated by the anecdotal observation that many US-based organizations appeared to be endorsing the GRADE approach but did not necessarily apply it to the fullest extent. We therefore sought to formally study this issue applying six published criteria of appropriate GRADE use. We limited to search to guidelines from US-based organizations that were included in the National Guideline Clearinghouse (NGC) which implied that they met certain, minimal criteria for evidence-based guidelines. Our search reached back to January 2011 and went to June 2018 after which time the NGCH lost its funding and stopped existing in that form.

Among guidelines documents from 315 organizations included in the database, 135 were from the US and were represented by at least one guideline. Our analysis ultimately included 67 guideline documents from 44 organizations. The vast majority of these guidelines were from professional organizations; mostly related to the field of internal medicine and its subspecialties. With regard to domains for rating the certainty of evidence, only one in 10 was explicit about including all five criteria for downgrading (study limitations, indirectness, inconsistency, imprecision, and publication bias) for a body of evidence from randomized trials and all three domains (large magnitude of effect, dose-response gradient, and direction of residual bias) for rating up a body of evidence from non-randomized trials. Over half of guidelines described explicit consideration of all four central domains (certainty of evidence, balance of benefits to harms, patients’ values and preferences and resource utilization) for moving from evidence to recommendations. All guidelines included the certainty of evidence and the vast majority also addressed the balance of desirable and undesirable consequences. When comparing guidelines published in 2011-2014 versus 2015-18, rates of appropriate use were higher for nearly all criteria, but only one main criterion met statistical significance, namely the reporting of evidence summaries supporting recommendations.

The take-home messages from this study are that one-in-three US based organizations developing evidence-based guidelines report the use of GRADE but that adherence to published criteria is quite inconsistent. As GRADE finds increasing uptake worldwide, continued efforts in training guideline methodologists and panel members will be important to assure appropriate application of GRADE methodology.


Dixon C, Dixon PE, Sultan S, Mustafa R, Morgan RL, Murad MH, Falck-Ytter Y, Dahm P. Guideline Developers in the United States were Inconsistent in Applying Criteria for Appropriate GRADE Use. Journal of Clinical Epidemiology. 2020 Mar 4.