Friday, December 18, 2020

Reviews of Screening Interventions Often Fail to Include Relevant Harms

With any given intervention comes a set of both potential desirable as well as undesirable effects, and proper consideration should be given to both in the context of clinical decision-making. However, our knowledge about potential undesirable effects (or "harms") of an intervention depends on the availability of the evidence, just as it does with the potential benefits. A recent systematic review of reviews for screening interventions suggests that the way evidence for harms is synthesized may not follow the same rigor and depth as for an intervention's potential desirable effects, limiting our ability to throughly weigh the two against one another when making clinical decisions to inform screening behaviors.

In the January 2021 issue of Journal of Clinical Epidemiology, Johanssen and colleagues systematically searched and screened 47 Cochrane reviews, making note of those that reported including potential harms as outcomes within the search strategy, even if no available evidence was ultimately found. Overdiagnosis was only included in 15% of the 39 reviews in which the Johanssen and colleagues deemed it a potentially relevant outcome; overtreatment was mentioned in 16% of eligible reviews. The inclusion of secondary harm outcomes in potentially eligible reviews ranged from 7% (incidental findings) to 91% (all-cause mortality). While psychosocial consequences was discussed as a potential outcome in a majority (64%) of eligible reviews, the data for this outcome were often not synthesized. 

Overall, reviews were less likely to meta-analyze or assess the risk of bias for evidence around harms than for benefits. 
Two-thirds (67%) of summary of findings tables, however, did not include any harms as outcomes; further, 42% of abstracts and 58% of plain language summaries did not mention any harms. 

The authors conclude that these findings demonstrate a need for a "broad collaboration" to develop reporting guidelines and core outcome sets that will ensure the more thorough and rigorous reporting of harms outcomes in screening studies. Through a consensus process involving a diverse set of stakeholders including clinicians, methodologists, policymakers, and medical ethicists, improved standards can be set for the reporting of all outcomes of screening interventions that are of potential relevance to patients.

Johansson M, Borys F, Peterson H, et al. 2021. Addressing the harms of screening - A review of outcomes in Cochrane reviews and suggestions for next steps. J Clin Epidemiol 129:68-73.

Manuscript available from the publisher's website here. 

Monday, December 14, 2020

Evidence Foundation Scholar Update: Christian Kershaw

Dr. Christian Kershaw, a health policy analyst with CGS Administrators, LLC, attended the fall 2019 GRADE Guideline Development Workshop free of charge as an Evidence Foundation scholar. As part of the scholarship, Dr. Kershaw submitted and then presented on a proposal for reducing bias in healthcare, focusing on how to build and lead cross-functional teams (blog post here). We followed up with Dr. Kershaw at one year post-workshop to see what's happened since her attendance.

"In my work as a health policy analyst for a Medicare Fee-for-Service contractor, I use my research background to help with the evaluation of scientific literature on products being considered for coverage by Medicare. My team was created because of a call for transparency in how Medicare makes coverage decisions. I received the Evidence Foundation scholarship to the Fall 2019 GRADE conference and attended along with my team members to learn GRADE methodology. Our goal was to determine if implementation of GRADE methodology would standardize our literature evaluation process for coverage decisions," said Dr. Kershaw.

"At the conference I presented on the benefits of establishing cross-functional teams. By joining forces with team members with a heterogeneous set of skills and backgrounds, we can leverage individual strengths and encourage innovation to reach a common goal. In my work, I collaborate with MDs, RNs, and policy experts with a well-rounded knowledge base of clinical standards, Medicare processes, and coverage policies. After learning GRADE methodology, we implemented the use of GRADE to improve the transparency and standardization of our process for writing coverage policies. our team has now completed two GRADE workshops, and we are constantly working to improve our use of this methodology. We have found that the use of GRADE helps our cross-functional team improve our ability to systematically make coverage determinations based on scientific evidence."

Stay tuned for future updates from other past Evidence Foundation scholars like Dr. Kershaw and the exciting work they are doing to improve the application of GRADE methodology and evidence-based medicine.

If you are interested in learning more about GRADE and attending the workshop as a scholarship recipient, applications for our upcoming workshop next May are now open. The deadline to apply is February 28, 2021. Details can be found here. 

Tuesday, December 8, 2020

No Single Definition of a Rapid Review Exists, but Several Common Themes Emerge

"The only consensus around [a rapid review] definition," write Hamel and colleagues in a review published in the January 2021 issue of the Journal of Clinical Epidemiology, "is that a formal definition does not exist."

In their new review, Hamel et al. sifted through 216 rapid reviews and 90 methodological articles published between 2017 and 2019 to better understand the existing definitions and use of the term "rapid review," identifying eight common themes among them all.

The figure below from the publication shows the relative usage of these themes throughout the relevant identified articles.

In summary of all definitions examined in the review, the authors suggest the following broad definition of a rapid review: "a form of knowledge synthesis that accelerates the process of conducting a traditional systematic review through streamlining or omitting a variety of methods to produce evidence in a resource-efficient manner."

To complicate matters further, Hamel and colleagues also found that reviews meeting these general criteria may not always go by the term "rapid." For instance, the term "restricted review" fits many of these same parameters, but is not necessarily defined by the amount of time from inception to publication. However, the lack of an as-yet agreed-upon definition of a "rapid review" may ultimately hamper authors and potential end-users of these products, as the accepted legitimacy of such reviews may depend upon a common understanding of their standards and methodological frameworks. In addition, the range of rigor and specific protocols continues to vary widely between products labeled as "rapid reviews." Until there is a broader consensus of the definition of a rapid review and what, exactly, it entails, this working definition and associated themes provide insight into the current state of the art.

Check out our related post on the two-week systematic review here.

Hamel C, Michaud A, Thuku M, Skidmore B, Stevens A, Nussbaumer-Streit B, and Garritty C. (2020). Defining rapid reviews: A systematic scoping review and thematic analysis of definitions and defining characteristics of rapid reviews. J Clin Epidemiol 129: 74-85.

Manuscript available from the publisher's website here

Thursday, December 3, 2020

Assessing the Reliability of Recently Developed Risk of Bias Tools for Non-Randomized Studies

Risk of bias is one of the five domains to be considered when assessing the certainty of evidence across a body of studies, and is the only domain which must first be assessed on the individual study level. While several risk of bias assessment tools exist for non-randomized studies (NRS; or observational trials), two of the most recently introduced are the Risk of Bias in Non-Randomized Studies of Interventions (ROBINS-I, developed in 2016) and the Risk of Bias instrument for NRS of Exposures (ROB-NRSE, developed in 2018). Assessment of the risk of bias in a systematic review off of which a guideline is based should ideally be conducted independnelty by at least two reviewers. Given this scenario, how likely is it that the two reviewers' assessments will agree sufficiently with one another?

In a recently published paper by Jeyaraman and colleagues, a multi-center group of collaborators assessed both the inter-rater reliability (IRR) and interconsensus reliability (ICR) of these tools based on a previously published cross-sectional study protocol. The seven reviewers had a median of 5 years of experience assessing risk of bias, and two pairs of reviewers assessed risk of bias using each tool. IRR was used to assess reliability within pairs, while ICR assessed reliability between the pairs. The time burden was also assessed by recording the amount of time required to assess each included study and to come to a consensus. For the overall assessment of bias, IRR was rated as "Poor" (Gwet's agreement coefficient of 0%) for the ROBINS-I tool and "slight" (11%) for the ROB-NRSE tool, whereas the ICR was rated as "poor" for both ROBIN-I (7%) and ROB-NRSE (0%). The average evaluator time burden was over 48 minutes for the ROBINS-I tool and almost 37 minutes for the ROB-NRSE.

Click to enlarge.

Click to enlarge.

The authors note that overall, ROBINS-I tended to have a better IRR as well as ICR, both of which may be due in part to poorer reporting quality in exposure studies. In addition, simplification of related guidance documents for applying the tool and increased training for reviewers looking to use the ROBINS-I and ROB-NRSE tools to assess risk of bias in non-randomized studies may improve agreement considerably while cutting down on the time required to apply the tool correctly to each individual study.

Jeyaraman MM, Rabbani R, Copstein L, Robson RC, Al-Yousif N, Pollock M, ... & Abou-Setta AM. (2020). Methodologically rigorous risk of bias tools for nonrandomized studies had low reliability and high evaluator burden. J Clin Epidemiol 128:140-147.

Manuscript available from the publisher's web site here. 

Wednesday, November 25, 2020

Diagnostic Test Accuracy Meta-Analyses Are Often Missing Information Required for Reproducibility

Reproducibility of results is considered a key tenet of the scientific process. When results of a study are reproduced by others using the same protocol, there is less chance that the original results observed were due human or random error. Testing the reproducibility of evidence syntheses (e.g., meta-analyses) is just as important as for individual trials.

In a paper published earlier this month, Stegeman and Leeflang undertook the task of testing the reproducibility of meta-analyses of diagnostic test accuracy. The authors identified 51 eligible meta-analyses published in January 2018. In 19 of these, sufficient information was provided in the text of the study to reproduce the 2x2 tables of the individual studies included; in the remaining 32, only estimates were provided in the text. In 17 of these 32, the authors located primary data to attempt reproducibility. When attempting to reproduce the meta-analyses of the 51 identified papers, reproducibility was only achieved 28% of the time; none of the 17 papers for which 2x2 tables were not provided were reproducible.

Click to enlarge.

Only 14 (27%) of the 51 articles provided full search terms. In nearly half (25) of the included reviews, at least one of the full texts of included references could not be located; in 12, at least one title or abstract could not be located. Overall, of the 51 included reviews, only one was deemed fully reproducible by providing a full protocol, 2x2 tables, and the same summary estimates as the authors.

The authors conclude with a call for increased prospective registration of protocols and improved reporting of search terms and methods. The application of the 2017 PRISMA statement for diagnostic test accuracy is a helpful tool for any aspiring author of a diagnostic test accuracy meta-analysis to improve the reporting and reproducibility of results.

Stegeman I. and Leeflang M.M.G. (2020). Meta-analyses of diagnostic test accuracy could not be reproduced. J Clin Epidemiol 127:161-166.

Manuscript available at the publisher's website here

Friday, November 20, 2020

Practical Tips for Finding and Assessing Patient Survey Data

 An essential part of translating a body of evidence into a clinical recommendation within the GRADE framework is the consideration of patients' values and preferences. Not only should the likely treatment preferences and values placed on outcomes among the patient population be considered; if there is likely a great amount of variability within these, this may also influence the ultimate strength of recommendation.

Guideline panels and public health decision-makers may use self-reported patient survey data to better understand the range of patient values and preferences when formulating recommendations or policies. However, like all sources of evidence, patient surveys may be at risk for specific sources of bias which can ultimately affect the results. What should decision-makers look out for when applying patient survey data to a recommendation for care? In a recently published paper, Santesso and colleagues propose a practical guide for finding, interpreting, and applying patient data to better inform healthcare decision-making.

Click to enlarge.

Because 97% of published surveys have been found to use the words "survey" or "questionnaire" in the title, the authors suggest using these terms in title, abstract, and topic fields when conducting a search for relevant data. When assessing the risk of bias of a given survey, decision-makers should ask whether the population was adequately representative of the patient population in question, taking care to consider the use of random sampling and the potential impact of nonresponse. A survey should also be assessed for whether it measures the intended constructs adequately. Survey authors should report the variability around reported measures whenever possible, and these data can be used to judge the overall variability in patient values and preferences. Finally, decision-makers should take care to discern how directly the survey data applies to the patient population in question; the table of survey respondent characteristics is a useful place from which to draw judgments of directness.

Using these helpful and practical points of guidance, guideline panel members and clinical decision-makers can better inform their retrieval, critical appraisal, and application of patient survey data to important healthcare questions, ultimately resulting in more informed guidelines and policies.

Santesso N, Akl E, Bhandari M, Busse JW, Cook DJ, Greenhalgh T, Muti P, Schünemann H, and Guyatt G. (2020). A practical guide for using a survey about attitudes and behaviors to inform health care decision making. J Clin Epidemiol 128:93-100.

Manuscript available from the publisher's website here. 

Monday, November 16, 2020

Evidence Foundation Welcomes Four Scholars in First Virtual Workshop

In late October, the U.S. Grade Network held its thirteenth GRADE Guideline Development Workshop. Like any of the twelve workshops before it, there was much learning, discussion, and networking to be shared. However, unlike any workshop in the past, it was fully online.

Among the 45 attendees who participated in offices and living rooms from Brazil to Cyprus were four participants who attended the workshop free of charge as recipients of the Evidence Foundation scholarship. During a virtual Evening with the Fall 2020 Evidence Foundation Scholars, these four bright minds presented briefly on a proposal or current project designed to reduce bias in healthcare.

Dr. Stavros Antoniou, Chair of the European Association for Endoscopic Surgery Guidelines Subcommittee, discussed the tripartite Guideline Assessment Project (GAP) aimed at developing an extension of the AGREE II tool for surgical guidelines. In an exploratory analysis published earlier in 2018 (GAP I), Antoniou and colleagues assessed 67 surgical guidelines and reported that development of more than one guideline per year, the presence of a guideline committee, and the use of GRADE was associated with higher scores in AGREE II. Second, the group explored the reliability, internal consistency, and unidimensionality of the AGREE II tool when applied to surgical guidelines (GAP II). The group is now in the process of using the Delphi process to identify and finalize items for the surgical extension based on stakeholder input, pilot-testing the instrument, and assessing its validity (GAP III). Of the workshop. Dr. Antoniou noted, "participating in the GRADE Guideline Workshop as a scholar was an inspirational experience. It was fascinating to be trained by world-renowned experts, who have embraced us with true interest and conveyed their passion with quality in guideline development."

Jung Min Han, PharmD, MS, manages the development of guidelines for the American Academy of Dermatology. Her presentation reviewed her current project to update the organization's 2016 guidelines on the management of acne vulgaris using the GRADE framework. Ms. Han discussed the plan to organize two working groups, one to review and update the nine clinical questions from the previous guidelines, and the other to add additional new questions as needed. An updated search would then be run for the first set of questions to identify any newly published evidence since the original guidelines were developed; simultaneously, a novel systematic search would be conducted for the second group of questions. New recommendations would then be drafted following the GRADE methodology. Ms. Han stated, "The GRADE Workshop has trained me to confidently use GRADE in different scenarios where head-to-head data from randomized controlled trials are not available. The workshop was very well-structured with a concrete theme and a mix of lectures, small and large group discussions, meet the experts Q&A sessions, and real-world examples that challenged trainees in many ways."

Dr. Georgios Schoretsanitis of Zucker Hillside Hospital in Glen Oaks, New York presented on his work developing guidelines for therapeutic drug monitoring to optimize and tailor treatment for psychotherapeutic medications. Beginning in 2017, a series of recommendations for reference ranges for two commonly prescribed antipsychotic medications was developed, followed this year by an international joint consensus statement on blood levels to optimize antipsychotic treatment in clinical practice. "For long I have been interested in conducting systematic reviews and meta-analyses," said Dr. Schoretsanitis. "Attending the GRADE Guideline Workshop organized by the US GRADE Network gave me exactly what I was looking for: a unique chance to essentially deepen my knowledge on major methodological aspects during stimulating lectures by experts that have set the tone in the field. It was an intense experience far beyond acquiring knowledge, which I highly suggest to every methodologist."

Dr. Zeinab Hosseini, a Saskatchewan Health Research post-doctoral fellow at the University of Saskatchewan, discussed her work examining the impact of exercise interventions on osteoporosis. Because gender and sex affect the prognosis and management of the disease, guidelines that consider these differences are needed, she said. As part of her research under the advisement of Dr. Phil Chilibeck, she hopes to contribute further understanding in the field related to gender- and sex-specific considerations for exercise recommendations in patients with osteoporosis, and to help inform future guideline recommendations on this topic. "The US GRADE Network Workshop was an amazing opportunity for me as a post-doctoral fellow in health proving insight on how to think as a health researcher from early stages of research up to knowledge translation and dissemination and how to provide evidence-based recommendations to inform the public considering situations where the literature is scarce," said Dr. Hosseini. "There are top women and men scientists on the training panel who respond to questions using their experiences as member on different panels, which I think is unique."

The USGN facilitators pose for a virtual group photo with the four fall 2020 Evidence Foundation scholars. Click to enlarge.

The Evidence Foundation thanks all four scholars for attending and contributing their engagement and expertise to our 2020 fall workshop.

If interested in applying for a scholarship to future GRADE workshops, more details can be found here: Please note the deadline for applications to our next workshop in Chicago, Illinois will be February 28, 2021.

Friday, October 30, 2020

U.S. Guideline-Producing Organizations Show Some Promise, Room for Improvement in their Application of GRADE

As many as one-third of guideline-producing health organizations in the United States report using the GRADE framework, but exactly how closely these organizations follow the key tenets of GRADE - such as using evidence summaries of each identified outcome to inform the overall certainty of evidence, and linking this certainty to a strength of recommendation - is a matter of debate.

In study by Dixon and colleagues published earlier this year in the Journal of Clinical Epidemiology, the authors set out to evaluate the use of GRADE in U.S.-based guidelines published between 2011 and 2018 and available in the National Guidelines Clearinghouse. Assessing up to three of the most recent guidelines from each of 135 identified U.S.-based organizations, the authors used several criteria to examine how closely each of the 67 resulting guidelines adhered to core GRADE concepts, including:

  • defining the certainty of evidence,
  • explicitly considering the GRADE domains when assessing the certainty of evidence, and
  • consistently defining the strength of resulting recommendations as strong or weak/conditional.
While most (89.6%) defined the certainty of evidence in a matter consistent with GRADE, only 10.4% explicitly reported examining certainty through all 8 GRADE criteria. Only 13.4% of guidelines assessing the certainty of evidence sourced from non-randomized trials reported assessing the potential reasons to upgrade the certainty of evidence (i.e., large magnitude of effect, dose-response gradient, and residual confounding). Finally, only about half (53.7%) provided an evidence profile or summary of findings table describing the assessments, and while reporting of the certainty of evidence and the balance between desirable and undesirable effects was most common (100% and 97%, respectively), explicit consideration of resource use and patients' values and preferences were also fairly common (73.1% and 77.6%, respectively) .The use of GRADE in line with the authors' established criteria appeared to grow somewhat more frequent over time, indicating a general trend toward proper use of GRADE.

Figure from Dixon et al. shows the relative reporting frequency of the various GRADE criteria for assessing certainty of evidence in years 2011-14 versus 2015-18, suggesting a trend for improved reporting over time. Click to enlarge.

The authors conclude that continued training of guideline developers and dissemination of education on the appropriate application of GRADE should further improve adherence, including the explicit consideration of all eight domains for assessing the certainty of evidence and of all aspects that inform the translation of this evidence into clinical recommendations.

Dixon C, Dixon PE, Sultan S, Mustafa R, Morgan RL, Murad MH, Falck-Ytter Y, and Dahm P. (2020). Guideline developers in the United States were inconsistent in applying criteria for appropriate Grading of Recommendations, Assessment, Development and Evaluation use. J Clin Epidemiol 124:193-199.

Manuscript available at the publisher's website here.

Monday, October 19, 2020

Existing Tools to Assess the Quality of Prevalence Reviews are Variable, with Some Missing Key Elements

Prevalence studies allow us to better understand the extent and impact of a health issue, guiding priority-setting for health care interventions, research, and clinical guidelines. While established tools for assessing the quality of guidelines, systematic reviews, and original research on interventions exist, no clear option has emerged as a way to assess the quality and risk of bias in prevalence research. The several tools that have been proposed, write the authors of a new systematic review of these instruments, are not without limitations.

Migliavaca and colleagues sifted through a total of 1,690 unique references, ending with a total of 30 tools that were either created for the direct purpose of assessing prevalence studies (n = 8) or were adaptable to this aim (n = 22). A grand total of 710 items from all of the tools were then combined into 119 items assessing similar constructs under six general domains: Population and Setting, Condition Measurement, Statistics, Manuscript Writing and Reporting, Study Protocols and Methods, and Nonclassified (e.g., importance of the study, applicability of results).

Click to enlarge.

The authors conclude that there was a great variability among tools assessed; further, several tools left out key elements that could affect the quality of a study, such as the representativeness of a sample, total sample size, or how the condition was assessed. Further, some tools fail to distinguish between assessments of whether the measure is valid, reliable, reproducible, or unbiased - differences that the authors of this review argue are important enough to warrant separate items in the development of a new tool. Although the authors suggest that a new, more comprehensive tool will improve the assessment of prevalence studies in the future, they identify the Joanna Briggs Institute Prevalence Critical Appraisal Tool as the best of what's currently available (downloadable from a list of JBI checklists here).

Migliavaca, C.B., Stein, C., Colpani, V., Munn, Z., Falavigna, M., and the Prevalence Estimates Reviews - Systematic Review Methodology Group (PERSyst). (2020). J Clin Epidemiol 127:59-68.

Manuscript available at the publisher's website here.

Tuesday, October 13, 2020

Equity Harms Related to Covid-19 Policies: Slowing the Spread Without Increasing Inequity

Since COVID-19 was first declared a pandemic in March of this year, numerous policies around the world have implemented some degree of lockdown, slashing social events and gatherings, shuttering once-bustling businesses and changing the face of the global economy. While the lockdowns in place were likely necessary to reduce the infection rate and resulting morbidity and mortality associated with the coronavirus, there are potentially undesirable consequences of these policies that affect measures of equity. In a new publication, Glover and colleagues present a framework for considering these effects and weighing them against the benefits of slowing the spread.

The work builds off of a novel combination of two existing frameworks. First, the Lorenc and Oliver framework lays out five potential harms of public health interventions which require mitigation: direct health harms, psychological harms, equity harms, group and social harms, and opportunity costs. Second, the PROGRESS-Plus health equity framework provides a list of 11 general categories that can affect measures of equity: Place of residence, Race, Occupation, Gender/sex, Religion, Education, Socioeconomic status, Social Capital, sexual orientation, age, and disability. Each of the two frameworks' individual components are used as a lens to examine the other. The resulting matrix of 55 potential sources of inequity related to the COVID-19 pandemic and its resulting public health policies provides an exemplary approach to considering all aspects of any large-scale public health intervention and the impact its implementation may have on inequity.

Key to the authors' resulting framework is the concept that both the policy responses to the pandemic and the nature of the pandemic itself are potential sources of inequity. For instance, individuals in lower-income occupations are also typically considered essential workers, and are less likely to have a safety net that would allow them to choose not to work or a job that is compatible with working remotely. Thus, the existing systemic inequities are exacerbated by the fact that they are now more likely to be exposed to the virus by continuing to go to work outside the home. However, policymakers can help reduce the impact of their policies on these sources of inequity - as well as ones caused by lockdown policies more directly - by considering mitigation strategies when implementing these policies (for example, by mandating improved sanitation, personal protective equipment, and social distancing for workers in vulnerable occupations). The figure below from the publication provides an overview of the relationship between the pandemic, policy responses and their resulting inequities, and potential points of intervention.

Click to enlarge.

The framework also allows for a more nuanced consideration of context in efforts to reduce the spread of coronavirus. Policies that are highly effective and viable in higher-income countries or areas with greater population density, for instance, may not be as beneficial in low- and middle-income countries and may even result in greater inequity. As with any intervention of any scale, the potential harms must be weighed against the desirable effects, and the context of the given intervention is key. This framework allows for consideration of a wider range of impacts when attempting to reduce illness and mortality in the age of a pandemic.

Glover, R.E., van Schalkwyk, M.C.I., Akl, E.A., Kristjannson, E., Lofti, T., Petkovic, J., ... & Welch, V. (2020). A framework for identifying and mitigating the equity harms of COVID-19 policy interventions. J Clin Epidemiol 128:35-48.

Manuscript available from the publisher's website here. 

Tuesday, October 6, 2020

New Study Examines the Impact of Abbreviated vs. Comprehensive Search Strategies on Resulting Effect Estimates

It's common practice - indeed, it's widely recommended - that systematic reviewers search multiple databases in addition to alternative sources of data such as the grey literature to ensure that no relevant studies are left out of analysis. However, meta-research on whether this theory holds up in practice is mainly limited to examinations of recall - in other words, reporting how many potentially relevant studies are picked up by an abbreviated search method as opposed to one that's more extensive. What's missing from this body of research, write Ewald and colleagues in a newly published study, is that recall studies compare items retrieved in absolute terms without considering the final weight or importance of each individual study - variables which will ultimately affect the direction, magnitude, and precision of the resulting effect estimate. Since larger studies with more caché are likely to have the greatest impact on the final estimate and certainty of evidence - and these studies are more likely to be picked up in even an abbreviated search - the added value of utilizing more extensive search strategies on a meta-analysis is left unclear.

To examine the impact of the extensiveness of a search strategy on resulting findings and certainty of evidence, the authors randomly selected 60 Cochrane reviews from a range of disciplines for which certainty of evidence assessments and summaries of findings were available. Thirteen reviews did not report at least one binary outcome, leaving a total of 47 for analysis. They then replicated these reviews' search strategies in addition to conducting 14 abbreviated searches for each review (e.g., MEDLINE only), such as limiting to one database or a combination of just two or three (e.g., MEDLINE and Embase only). Finally, meta-analyses were replicated for each of these scenarios, leaving out studies that would not have been picked up in the various abbreviated search strategies. 

Searching only one database led to a loss of at least one trial in half of the reviews, and a loss of two trials in one-quarter of them. As may be expected, the use of additional databases reduced the loss of information. Overall, however, the direction and significance of the resulting effect estimates remained unchanged in a majority of the cases, as shown in Figure 1 from the paper, below.

Click to enlarge.

The use of abbreviated searches did, however, introduce some amount of imprecision, typically increasing standard error by around 1.02 to 1.06-fold. The inclusion of multiple versus a single database did not clearly appear to improve precision compared to a comprehensive search.

The authors note that these findings are particularly applicable to authors of potential rapid reviews and guidelines, where a consideration of trade-offs between speed and thoroughness is of great importance. Rapid reviewers should be aware that limiting search strategy may change the direction of an effect estimate or render an effect estimate uncalculable in up to one in seven instances, but this should be weighed against the benefits of a quicker time to the dissemination of findings, especially during emergent health crises where time is of the essence.

Ewald, H., Klerings, I., Wagner, G., Heise, T.L., Dobrescu, A.I., Armijo-Olivo, S., ... & Hemkens, L.G. (2020). Abbreviated and comprehensive literature searches led to identical or very similar effect estimates: A meta-epidemiological study. J Clin Epidemiol 128:1-12.

Manuscript available from publisher's website here.  

Wednesday, September 30, 2020

Four Questions to Ask Before Replicating a Systematic Review

Just as with individual research trials, the replication of a systematic review can shed new light on an existing topic or help further solidify our assessment of the certainty of a body of evidence. However, duplication of efforts that is done unintentionally or without deliberate consideration of methodology (e.g., how similar or different the new review will be in terms of evidence searching, inclusion, and synthesis) is wasteful. How is one to know when the replication of a systematic review is appropriate and warranted?

A new consensus checklist recently published by Tugwell and colleagues in BMJ provides guidance on when - and when not - to conduct a systematic review replication. Driven by a six-person executive team, the checklist was informed by the input of methodologists, including experts in fields ranging from clinical epidemiology to guideline development and health economics, to knowledge users - those involved in the funding, commissioning, and development of systematic reviews. Two patients were involved in the development team and an additional 17 patient and public representatives were consulted for input via survey. 

The process culminated in the drafting of the checklist in a face-to-face setting, with an original 12 proposed items solidified into a final four. The items ask whether replication of the systematic review is of high priority (e.g., whether replication results will be expected to guide policymakers or be of relevance to stakeholders), whether there are certain methodological concerns (such as search design, scope of PICOs, etc.) that will be clarified or improved with a replication; whether the implementation of the replication's findings would be expected to have a sizable positive or negative impact on the population or individual level; and whether resources (e.g., time, money) spent on replication would not be better spent on conducting a new review to answer a novel question. 

Click to enlarge.

The ultimate decision of whether or not to replicate should be informed by the answers to these questions, the authors note, and left up to contextualized judgment rather than a quantitative threshold. Further, some of the items may be of higher or lower relevancy depending on the stakeholders for a specific review topic, and "middle-ground" solutions, such as repeating only the parts of a systematic review in need of replication, should be considered individually. The authors plan to test the usability, acceptability, and usefulness of this newly proposed tool with relevant end-users.

Tugwell, P., Welch, V.A., Karunananthan, S., Maxwell, L.J., Akl, E.A., Avey, M.T., ... & White, H. 2020. When to replicate systematic reviews of interventions: Consensus checklist. BMJ 370:m2864. 

Manuscript available from the publisher's website here. 

Thursday, September 24, 2020

Pre-Print of PRISMA 2020 Updated Reporting Guidelines Released

Upon their publication in 2009, the PRISMA guidelines have become the standard for reporting in systematic reviews and meta-analyses. Now, 11 years later, the PRISMA checklist has received a fresh facelift for 2020 that incorporates the methodological advances that have taken place over the intervening years.

In a recently released pre-print, Page and colleagues describe their approach to designing the new and improved PRISMA. Sixty reporting documents were reviewed to identify any new items deserving of consideration and 110 systematic review methodologists and journal editors were surveyed for feedback. The new PRISMA 2020 draft was then developed based on discussion at an in-person meeting and iteratively revised based on co-author input and a sample of 15 experts.

The result is an expanded, 27-item checklist replete with elaboration of the purpose for each item, a sub-checklist specifically for reporting within the abstract, and revised flow diagram templates for both original and updated systematic reviews. Here are some of the major changes and additions to be aware of:

  • Recommendation to present search strategies for all databases instead of just one.
  • Recommendation that authors list "near-misses," or studies that met many but not all inclusion criteria, in the results section.
  • Recommendation to assess certainty of synthesized evidence.
  • New item for declaration of Conflicts of Interest.
  • New item to indicate whether data, analytic code, or other materials have been made publicly available.
Page, M., McKenzie, J., Bossuyt, P., Boutron, I., Hoffman, T., Mulow, C., ... & Moher, D. 2020. The PRISMA 2020 Statement: An updated guideline for reporting systematic reviews. 

Pre-print available from MetaArXiv here. 

Friday, September 18, 2020

WHO Guidelines are Considering Health Equity More Frequently, but Reporting of Judgments is Often Incomplete

The GRADE evidence-to-decision (EtD) framework was developed as a way to more explicitly and transparently inform the considerations of the implications of clinical recommendations, such as the potential positive or negative impacts on health equity. A new analysis of World Health Organization (WHO) guidelines published between 2014 and 2019 - over half (54%) of which used the EtD framework - examines the consideration of health equities in the guidelines' resulting recommendations.

Dewidar and colleagues found that the guidelines utilizing the EtD framework were more likely to be addressing health issues in socially disadvantaged populations (42% of those developed with the EtD versus 24% of those without). What's more, the use of the EtD framework has risen over time, from 10% of guidelines published in 2016 (the year of the EtD's introduction) to 100% of those published within the first four months of 2019. Use of the term "health equity" increased to a similar degree over this period.

Just over one-third (38%) of recommendations were judged to increase or probably increase health equity, while 15% selected the judgment "Don't know/uncertain" and 8% provided no judgment. Just over one-quarter (28%) of the recommendations utilizing the EtD framework provided evidence for the judgment. When detailed judgments were provided, they were more likely to discuss the potential impacts of place of residence and socioeconomic status and less likely to explicitly consider gender, education, race, social capital, occupation, or religion.

Click to enlarge.

The authors conclude that while consideration of the potential impacts of recommendations on health equity has increased considerably in recent years, reporting of these judgments is still often incomplete. Reporting which published research evidence or additional considerations were used to make a judgment, as well as considering the various PROGRESS factors (Place, Race, Occupation, Gender, Religion, Education, Socioeconomic status, and Social capital) will likely improve the transparency of recommendations in future guidelines where health equity impacts are of concern.

Dwidr, O., Tsang, P., León-Garcia, M., Mathew, C., Antequera, A., Baldeh, T., ... & Welch, V. 2020. Over half of WHO guidelines published from 2014 to 2019 explicitly considered health equity issues: A cross-sectional suvey. J Clin Epidemiol 127:125-133.

Manuscript available from the publisher's website here.

Monday, September 14, 2020

Timing and Nature of Financial Conflicts of Interest Often Go Unreported, Systematic Survey Finds

The proper disclosure and management of financial Conflicts of Interest (FCOI) within the context of a published randomized controlled trial is vital to alerting the reader to the sources of funding for the research and other financial factors that may influence the design, conduct, or reporting of the trial.

A recently published cross-sectional survey by Hakoum and colleagues examined the nature of FCOI reporting in a sample of 108 published trials found that 99% of these reported individual author disclosures, while only 6% reported potential sources of FCOI at the institutional level. Individual authors reported a median of 2 FCOIs. Among the 2,972 FCOIs reported by 806 individuals, the greatest proportion came from personal fees other than employment income (50%) and from grants (34%). Further, of those disclosing individual FCOI, a large majority (85%) were provided by private-for-profit entities. Notably, only one-third (33%) of these disclosures included the timing of the funding in relation to the trial, 17% reported the relationship between the funding source and the trial, and just 1% reported the monetary value.

Click to enlarge.

Using a multivariate regression, the authors found that the reporting of FCOI by individual authors was positively associated with nine factors, most strongly with the authors being from an academic institution (OR: 2.981; 95% CI: 2.415 – 3.680), with the funding coming from an entity other than private-for-profit (OR: 2.809; 95% CI: 2.274 – 3.470), and the first author’s affiliation being from a low- or middle-income country (OR: 2.215; 95% CI: 1.512 – 3.246).


More explicit and complete reporting of FCOIs, the authors conclude, may improve readers’ level of trust in the results of a published trial and in the authors presenting them. To improve the nature and transparency of FCOI reporting, researchers may consider disclosing details related to the funding’s source, including the timing of the funding in relation to the conduct and publication of the trial, the relationship between the funding source and the trial, and the monetary value of the support.

Hakoum, M.B., Noureldine, H., Habib, J.R., Abou-Jaoude, E.A., Raslan, R., Jouni, H., ... & Akl, E.A. (2020). Authors of clinical trials seldom reported details when declaring their individual and institutional financial conflicts of interest: A cross-sectional survey. J Clin Epidemiol 127:49-58.

Manuscript available from the publisher's website here

Tuesday, September 8, 2020

Assessing Health-Related Quality of Life Improvement in the Modern Anticancer Therapy Era

Recent breakthroughs in anticancer therapies such as small-molecule drugs and immunotherapies have made improvements in Health-Related Quality of Life (HRQOL) possible among cancer patients over the course of treatment. In a recent paper published in the Journal of Clinical Epidemiology, Cottone and colleagues are the first to propose the framework for assessing the change in HRQOL over time in these patients: Time to HRQOL Improvement (TTI), and Time to Sustained HRQOL Improvement (TTSI).

In the proposed framework, TTI is based on the time to the “first clinically meaningful improvement occurring in a given scale or in at least one among different scales” – for instance, a minimal important difference (MID) of 5 points on the European Organization for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire – Core 30 (QLQ-C30). The authors suggest utilizing the first posttreatment score as the baseline measurement for monitoring improvements over time. “Sustained improvement” was defined as the first improvement that is not followed by a deterioration that meets or exceeds the MID.


The use of Kaplan-Meier curves and Cox proportional hazards is inappropriate for these outcomes, the authors argue, as it does not allow for possible competing events, such as disease progression, toxicity, or the possibility of an earlier improvement in another scale when multiple scales are used. They propose the use of the Fine-Gray model for the evaluation of TTI and TTSI and pilot it with a case study of 124 newly diagnosed chronic myeloid leukemia patients undergoing first-line treatment with nilotinib.

Time To Improvement (TTI) and Time to Sustained Improvement (TTSI) can be used to elucidate differences in HRQOL responses to treatment based on baseline characteristics. Here, the figure shows TTSI in fatigue scores based on hemoglobin level at baseline. Click to enlarge.

Using this model, the authors found that improvements in fatigue scores appeared more quickly than those in physical functioning when measuring scores from baseline (pre-treatment), but upon using first post-treatment score as the baseline, the differences between improvement rates in fatigue and physical functioning diminished. Additionally, a lower baseline hemoglobin level was associated with earlier sustained improvements in fatigue.


While the proposed method of evaluating TTI and TTSI has some limitations, such as lower statistical power than other ways of tracking changes in HRQOL over time, it also has notable strengths. In particular, this method can be used to elucidate differences between treatment approaches that show similar survival outcomes so that the approach with shorter TTI and TTSI can be favored.

Cottone, F., Collins, G.S., Anota, A., Sommer, K., Giesinger, J.M., Kieffer, J.M., ... & Efficace, F. (2020). Time to health-related quality of life improvement analysis was developed to enhance evaluation of modern anticancer therapies. J Clin Epidemiol 127:9-18.

Manuscript available from publisher's website here. 

Wednesday, September 2, 2020

A New Tool for Assessing the Credibility of Effect Modification Cometh: Introducing the ICEMAN

Effect modification goes by many other names: “subgroup effect,” “statistical interaction,” and “moderation,” to name a few. Regardless of what it’s called, the existence of effect modification in the context of an individual study means that the effect of an intervention varies between individuals based on an attribute such as age, sex, or severity of underlying disease. Similarly, a systematic review may aim to identify effect modification between individual studies based on their setting, year of publication, or methodological differences (often called a “subgroup analysis”).

As many as one-quarter of randomized controlled trials (RCTs) and meta-analyses examine their findings for potential evidence of effect modification, according to a paper by Schandelmaier and colleagues published in the latest edition of CMAJ. However, it is not uncommon for claims of effect modification to be later proved spurious, which may negatively affect the quality of care in those subgroups of patients. Potential sources of these claims range from simple random chance to issues with selective reporting and misguided application of statistical analyses.

Click to enlarge.

In “Development of the Instrument to assess the Credibility of Effect Modification in Analyses (ICEMAN) in randomized controlled trials and meta-analyses,” the authors present a novel tool for evaluating the presence of a potential modifier. While several sets of criteria have been developed in the past for this purpose, the ICEMAN is the first to be based on a rigorous development process and refined with formal user testing.


First, the authors conducted a systematic survey of the literature to ensure a comprehensive understanding of the previously proposed criteria for evaluating effect modification. Thirty sets were identified, none of which adequately reflected the authors’ conceptual framework. Second, an expert panel of 15 members was identified randomly from a list of 40 identified through the systematic survey. These experts then pared down the initial list of 36 candidate criteria to 20 required and eight optional items. After developing a manual for its use, the authors tested the instrument among a diverse group of 17 potential users, including authors of Cochrane reviews and RCTs and journal editors using a semi-structured interview technique.

Schandelmaier, S., Briel, M., Varadhan, R., Schmid, C.H., Devasenapathy, N., Hayward, R.A., Gagnier, J., ... & Guyatt, G.H. 2020. Development of the Instrument to assess the Credibility of Effect Modification Analyses (ICEMAN) in randomized controlled trials and meta-analyses. CMAJ 192:E901-906.

Manuscript available at the publisher's website here

Wednesday, August 26, 2020

Rapid, Up-to-Date Evidence Synthesis in the Time of COVID

In emergent situations with sparse and rapidly evolving bodies of research, evidence synthesis programs must be able to adapt to a shortened timeline to provide clinicians with the best available evidence for decision-making. (See our previous posts on rapid systematic review and guideline development, here, here, here, and here). But perhaps no health crisis in the modern era has made this more clear than the coronavirus disease 2019 (COVID-19) pandemic.

Recently, Murad and colleagues published a framework detailing a four-pillar program through which they have been able to synthesize evidence related to the COVID-19 pandemic. This system has been tried and tested within the Mayo Clinic, a multi-state academic center with more than 1.2 million patients per year.


Launched within two weeks of the World Health Organization’s declaration of COVID-19 as a pandemic, Mayo Clinic’s evidence synthesis program consisted of four major components:

  • What is New?: an automatically generated list of COVID-19-related studies published within the last three days and categorized into topic areas such as diagnosis or prevention
  • Repository of Studies: a running list of previously published studies since the first case report of COVID-19, including those that move from the “What is New?” list after three days’ time
  • Rapid Reviews: reviews published within three to four days in response to pressing clinical questions from those on the frontlines and utilizing the study repository. To facilitate evidence synthesis, studies are often screened and selected by a single reviewer and evidence is rarely meta-analyzed.
  • Repository of Reviews: a collection of reviews including those developed at Mayo and elsewhere, identified in twice-weekly searches and through a list of predetermined websites. To supplement knowledge, some reviews included indirect evidence borrowed from studies of other coronaviruses or respiratory infections, when appropriate.
Click to enlarge.

Within one month of the framework’s establishment, the team had conducted seven in-house rapid reviews and had indexed more than 100 newly published reviews into a database housing over 2,000 total.

The authors conclude that while an intensive system such as this may not be feasible in smaller health systems, cross-collaboration and sharing of knowledge can allow for informed and up-to-date clinical care that adapts in the face of a rapidly changing landscape of evidence.

Murad, M.H., Nayfeh, T., Suarez, M.U., Seisa, M.O., Abd-Rabu, R., Farah, M.H.E..., & Saadi, S.M. 2020. A framework for evidence synthesis programs to respond to a pandemic. Mayo Clin Proc 95(7):1426-1429.

Manuscript available at the publisher's website here.

Friday, August 14, 2020

New Elaboration of CONSORT Items Aims to Improve the Reporting of Deprescribing Trials

Deprescribing is the act of withdrawing a treatment prescription from patients for whom a medication has become inappropriate or in whom the risks may now outweigh the benefits. However, trials examining the effects of deprescribing are often complex and multi-faceted, and reporting of these trials can miss important aspects such as patient selection and length of follow-up. 

A recently published paper by Blom et al. used a multistep process to develop a reporting guideline for deprescribing trials based on a systematic review of this body of research, paying close attention to those aspects that most commonly went unreported. The result was an elaboration of the Consolidated Standards of Reporting Trials (CONSORT) statement, with the addition of items reviewed by a panel of 14 experts in the areas of ranging from pharmacology and geriatric medicine to statistics and reporting guidelines. The process, which ended with a one-day face-to-face meeting to approve the elaborated items, also took into account the Template for Intervention Description and Replication (TIDieR) checklist to ensure that a comprehensive list was created.

Click to enlarge.

The panel determined that all items of the original CONSORT checklist are applicable to deprescribing trials, but that certain items required further detail. The CONSORT items that required the most attention with regards to deprescribing studies included the following:

  • description of trial design
  • participant selection 
  • detailed information that would allow replication of the intervention studied
  • pre-specification of primary and secondary outcome
  • discussion of adverse events and harms, including those related to drug withdrawal
  • defined periods of recruitment and follow-up


In addition to improving the quality of reporting in deprescribing trials, the authors also recommend increasing the amount of dedicated funds available for deprescribing studies, which are currently scarce and not incentivized by common streams of research funding.

Blom, J.W., Muth, C., Glasziou, P., McCormarck, J.P., Perera, R., Poortvliet, R.K.E..., & Knottnerus, J.A. 2020. Describing deprescribing trials better: An elaboration of the CONSORT statement. J Clin Epidemiol 127: 87-95.

Manuscript available from the publisher's website here.

Thursday, August 6, 2020

New Systematic Review Suggests Noncordance with COI Disclosure to Reporting Databases is Widespread, but Methodological Quality of Studies is Variable

Disclosure of conflict of interest (COI) is a major point of concern in the development of guidelines as well as original research papers. Over the years, multiple studies have aimed to elucidate just how closely the disclosures of individual authors tracks with their reported COI in open databases. A new systematic review of 27 such studies, recently published online in the Journal of Clinical Epidemiology, compiles the findings of these studies into some eyebrow-raising statistics while also taking a look at the methodological quality of these studies.


In their review, El-Rayass and colleagues found that although the methodological quality for assessing the concordance of authors’ COI disclosures within papers and according to public databases varied widely, a median of 81.2% of authors across 20 studies had “noncorcordant” disclosures, (ranging from 41.8% to 98.6% across all studies) and that more than half (43.4% of all authors) of these were “completely nonconcordant” (ranging from 15% to 89.5% across all studies). What’s more, among seven studies that analyzed company reporting on the individual level, between 23.1% and 85.4% of companies did not report their payments to authors.

Click to enlarge.

For the five studies that analyzed disclosures on the study rather than the individual author level, all found at least some degree of discordance between in-study disclosures and database reports. The rate of nonconcordant disclosures among these studies ranged from 6 to 92.6%


The authors note that ulterior motives of authors are just one potential explanation for the high observed rate of nonconcordant COI disclosure and reporting. Vague instructions and parameters set by journals during the article submission process may undermine efforts to transparently report any and all potential sources of conflict, be they financial, intellectual or otherwise. In addition, the authors found that studies of COI reporting that tended to have higher methodological quality also tended to report lower estimates of nonconcordance, meaning that the overall combined estimates may be artificially inflated – for instance, due to some studies not making a distinction about the relevancy of potential COI sources to the topic of the articles analyzed. The authors note potential sources of nondirectional error as well, such as how differences in COI categories between in-paper disclosures and reference databases were handled, which additionally lowers confidence in the current estimate.

Click to enlarge.

In sum, the recent review by El-Rayess et al. points out that issues with concordance between authors’ COI disclosures in their published works seem to be at odds with publicly available reports of these relationships; however, the degree of nonconcordance overall is still uncertain. Those looking to complete future analyses of COI disclosure policies may want to use this paper as a roadmap to improving our certainty in the actual magnitude of the issue.

El-Rayess, H., Khamis, A.M., Haddad, S., Ghaddara, H.A., Hakoum, M., Ichkhanian, Y., Bejjani, M., and Akl, E.A. Assessing concordance of financial conflicts of interest disclosures with payments' databases: A systematic survey of the health literature. J Clin Epidemiol 127:19-28.

Manuscript available at the publisher's website here.