Wednesday, November 25, 2020

Diagnostic Test Accuracy Meta-Analyses Are Often Missing Information Required for Reproducibility

Reproducibility of results is considered a key tenet of the scientific process. When results of a study are reproduced by others using the same protocol, there is less chance that the original results observed were due human or random error. Testing the reproducibility of evidence syntheses (e.g., meta-analyses) is just as important as for individual trials.

In a paper published earlier this month, Stegeman and colleagues undertook the task of testing the reproducibility of meta-analyses of diagnostic test accuracy. The authors identified 51 eligible meta-analyses published in January 2018. In 19 of these, sufficient information was provided in the text of the study to reproduce the 2x2 tables of the individual studies included; in the remaining 32, only estimates were provided in the text. In 17 of these 32, the authors located primary data to attempt reproducibility. When attempting to reproduce the meta-analyses of the 51 identified papers, reproducibility was only achieved 28% of the time; none of the 17 papers for which 2x2 tables were not provided were reproducible.

Click to enlarge.

Only 14 (27%) of the 51 articles provided full search terms. In nearly half (25) of the included reviews, at least one of the full texts of included references could not be located; in 12, at least one title or abstract could not be located. Overall, of the 51 included reviews, only one was deemed fully reproducible by providing a full protocol, 2x2 tables, and the same summary estimates as the authors.

The authors conclude with a call for increased prospective registration of protocols and improved reporting of search terms and methods. The application of the 2017 PRISMA statement for diagnostic test accuracy is a helpful tool for any aspiring author of a diagnostic test accuracy meta-analysis to improve the reporting and reproducibility of results.

Stegeman I. and Leeflang M.M.G. (2020). Meta-analyses of diagnostic test accuracy could not be reproduced. J Clin Epidemiol 127:161-166.

Manuscript available at the publisher's website here

Friday, November 20, 2020

Practical Tips for Finding and Assessing Patient Survey Data

 An essential part of translating a body of evidence into a clinical recommendation within the GRADE framework is the consideration of patients' values and preferences. Not only should the likely treatment preferences and values placed on outcomes among the patient population be considered; if there is likely a great amount of variability within these, this may also influence the ultimate strength of recommendation.

Guideline panels and public health decision-makers may use self-reported patient survey data to better understand the range of patient values and preferences when formulating recommendations or policies. However, like all sources of evidence, patient surveys may be at risk for specific sources of bias which can ultimately affect the results. What should decision-makers look out for when applying patient survey data to a recommendation for care? In a recently published paper, Santesso and colleagues propose a practical guide for finding, interpreting, and applying patient data to better inform healthcare decision-making.

Click to enlarge.

Because 97% of published surveys have been found to use the words "survey" or "questionnaire" in the title, the authors suggest using these terms in title, abstract, and topic fields when conducting a search for relevant data. When assessing the risk of bias of a given survey, decision-makers should ask whether the population was adequately representative of the patient population in question, taking care to consider the use of random sampling and the potential impact of nonresponse. A survey should also be assessed for whether it measures the intended constructs adequately. Survey authors should report the variability around reported measures whenever possible, and these data can be used to judge the overall variability in patient values and preferences. Finally, decision-makers should take care to discern how directly the survey data applies to the patient population in question; the table of survey respondent characteristics is a useful place from which to draw judgments of directness.

Using these helpful and practical points of guidance, guideline panel members and clinical decision-makers can better inform their retrieval, critical appraisal, and application of patient survey data to important healthcare questions, ultimately resulting in more informed guidelines and policies.

Santesso N, Akl E, Bhandari M, Busse JW, Cook DJ, Greenhalgh T, Muti P, Sch√ľnemann H, and Guyatt G. (2020). A practical guide for using a survey about attitudes and behaviors to inform health care decision making. J Clin Epidemiol 128:93-100.

Manuscript available from the publisher's website here. 

Monday, November 16, 2020

Evidence Foundation Welcomes Four Scholars in First Virtual Workshop

In late October, the U.S. Grade Network held its thirteenth GRADE Guideline Development Workshop. Like any of the twelve workshops before it, there was much learning, discussion, and networking to be shared. However, unlike any workshop in the past, it was fully online.

Among the 45 attendees who participated in offices and living rooms from Brazil to Cyprus were four participants who attended the workshop free of charge as recipients of the Evidence Foundation scholarship. During a virtual Evening with the Fall 2020 Evidence Foundation Scholars, these four bright minds presented briefly on a proposal or current project designed to reduce bias in healthcare.

Dr. Stavros Antoniou, Chair of the European Association for Endoscopic Surgery Guidelines Subcommittee, discussed the tripartite Guideline Assessment Project (GAP) aimed at developing an extension of the AGREE II tool for surgical guidelines. In an exploratory analysis published earlier in 2018 (GAP I), Antoniou and colleagues assessed 67 surgical guidelines and reported that development of more than one guideline per year, the presence of a guideline committee, and the use of GRADE was associated with higher scores in AGREE II. Second, the group explored the reliability, internal consistency, and unidemsionality of the AGREE II tool when applied to surgical guidelines (GAP II). The group is now in the process of using the Delphi process to identify and finalize items for the surgical extension based on stakeholder input, pilot-testing the instrument, and assessing its validity (GAP III). Of the workshop. Dr. Antoniou noted, "participating in the GRADE Guideline Workshop as a scholar was an inspirational experience. It was fascinating to be trained by world-renowned experts, who have embraced us with true interest and conveyed their passion with quality in guideline development."

Jung Min Han, PharmD, MS, manages the development of guidelines for the American Academy of Dermatology. Her presentation reviewed her current project to update the organization's 2016 guidelines on the management of acne vulgaris using the GRADE framework. Ms. Han discussed the plan to organize two working groups, one to review and update the nine clinical questions from the previous guidelines, and the other to add additional new questions as needed. An updated search would then be run for the first set of questions to identify any newly published evidence since the original guidelines were developed; simultaneously, a novel systematic search would be conducted for the second group of questions. New recommendations would then be drafted following the GRADE methodology. Ms. Han stated, "The GRADE Workshop has trained me to confidently use GRADE in different scenarios where head-to-head data from randomized controlled trials are not available. The workshop was very well-structured with a concrete theme and a mix of lectures, small and large group discussions, meet the experts Q&A sessions, and real-world examples that challenged trainees in many ways."

Dr. Georgios Schoretsanitis of Zucker Hillside Hospital in Glen Oaks, New York presented on his work developing guidelines for therapeutic drug monitoring to optimize and tailor treatment for psychotherapeutic medications. Beginning in 2017, a series of recommendations for reference ranges for two commonly prescribed antipsychotic medications was developed, followed this year by an international joint consensus statement on blood levels to optimize antipsychotic treatment in clinical practice. "For long I have been interested in conducting systematic reviews and meta-analyses," said Dr. Schoretsanitis. "Attending the GRADE Guideline Workshop organized by the US GRADE Network gave me exactly what I was looking for: a unique chance to essentially deepen my knowledge on major methodological aspects during stimulating lectures by experts that have set the tone in the field. It was an intense experience far beyond acquiring knowledge, which I highly suggest to every methodologist."

Dr. Zeinab Hosseini, a Saskatchewan Health Research post-doctoral fellow at the University of Saskatchewan, discussed her work examining the impact of exercise interventions on osteoporosis. Because gender and sex affect the prognosis and management of the disease, guidelines that consider these differences are needed, she said. As part of her research under the advisement of Dr. Phil Chilibeck, she hopes to contribute further understanding in the field related to gender- and sex-specific considerations for exercise recommendations in patients with osteoporosis, and to help inform future guideline recommendations on this topic. "The US GRADE Network Workshop was an amazing opportunity for me as a post-doctoral fellow in health proving insight on how to think as a health researcher from early stages of research up to knowledge translation and dissemination and how to provide evidence-based recommendations to inform the public considering situations where the literature is scarce," said Dr. Hosseini. "There are top women and men scientists on the training panel who respond to questions using their experiences as member on different panels, which I think is unique."

The USGN facilitators pose for a virtual group photo with the four fall 2020 Evidence Foundation scholars. Click to enlarge.

The Evidence Foundation thanks all four scholars for attending and contributing their engagement and expertise to our 2020 fall workshop.

If interested in applying for a scholarship to future GRADE workshops, more details can be found here: Please note the deadline for applications to our next workshop in Chicago, Illinois will be February 28, 2021.

Friday, October 30, 2020

U.S. Guideline-Producing Organizations Show Some Promise, Room for Improvement in their Application of GRADE

As many as one-third of guideline-producing health organizations in the United States report using the GRADE framework, but exactly how closely these organizations follow the key tenets of GRADE - such as using evidence summaries of each identified outcome to inform the overall certainty of evidence, and linking this certainty to a strength of recommendation - is a matter of debate.

In study by Dixon and colleagues published earlier this year in the Journal of Clinical Epidemiology, the authors set out to evaluate the use of GRADE in U.S.-based guidelines published between 2011 and 2018 and available in the National Guidelines Clearinghouse. Assessing up to three of the most recent guidelines from each of 135 identified U.S.-based organizations, the authors used several criteria to examine how closely each of the 67 resulting guidelines adhered to core GRADE concepts, including:

  • defining the certainty of evidence,
  • explicitly considering the GRADE domains when assessing the certainty of evidence, and
  • consistently defining the strength of resulting recommendations as strong or weak/conditional.
While most (89.6%) defined the certainty of evidence in a matter consistent with GRADE, only 10.4% explicitly reported examining certainty through all 8 GRADE criteria. Only 13.4% of guidelines assessing the certainty of evidence sourced from non-randomized trials reported assessing the potential reasons to upgrade the certainty of evidence (i.e., large magnitude of effect, dose-response gradient, and residual confounding). Finally, only about half (53.7%) provided an evidence profile or summary of findings table describing the assessments, and while reporting of the certainty of evidence and the balance between desirable and undesirable effects was most common (100% and 97%, respectively), explicit consideration of resource use and patients' values and preferences were also fairly common (73.1% and 77.6%, respectively) .The use of GRADE in line with the authors' established criteria appeared to grow somewhat more frequent over time, indicating a general trend toward proper use of GRADE.

Figure from Dixon et al. shows the relative reporting frequency of the various GRADE criteria for assessing certainty of evidence in years 2011-14 versus 2015-18, suggesting a trend for improved reporting over time. Click to enlarge.

The authors conclude that continued training of guideline developers and dissemination of education on the appropriate application of GRADE should further improve adherence, including the explicit consideration of all eight domains for assessing the certainty of evidence and of all aspects that inform the translation of this evidence into clinical recommendations.

Dixon C, Dixon PE, Sultan S, Mustafa R, Morgan RL, Murad MH, Falck-Ytter Y, and Dahm P. (2020). Guideline developers in the United States were inconsistent in applying criteria for appropriate Grading of Recommendations, Assessment, Development and Evaluation use. J Clin Epidemiol 124:193-199.

Manuscript available at the publisher's website here.

Monday, October 19, 2020

Existing Tools to Assess the Quality of Prevalence Reviews are Variable, with Some Missing Key Elements

Prevalence studies allow us to better understand the extent and impact of a health issue, guiding priority-setting for health care interventions, research, and clinical guidelines. While established tools for assessing the quality of guidelines, systematic reviews, and original research on interventions exist, no clear option has emerged as a way to assess the quality and risk of bias in prevalence research. The several tools that have been proposed, write the authors of a new systematic review of these instruments, are not without limitations.

Migliavaca and colleagues sifted through a total of 1,690 unique references, ending with a total of 30 tools that were either created for the direct purpose of assessing prevalence studies (n = 8) or were adaptable to this aim (n = 22). A grand total of 710 items from all of the tools were then combined into 119 items assessing similar constructs under six general domains: Population and Setting, Condition Measurement, Statistics, Manuscript Writing and Reporting, Study Protocols and Methods, and Nonclassified (e.g., importance of the study, applicability of results).

Click to enlarge.

The authors conclude that there was a great variability among tools assessed; further, several tools left out key elements that could affect the quality of a study, such as the representativeness of a sample, total sample size, or how the condition was assessed. Further, some tools fail to distinguish between assessments of whether the measure is valid, reliable, reproducible, or unbiased - differences that the authors of this review argue are important enough to warrant separate items in the development of a new tool. Although the authors suggest that a new, more comprehensive tool will improve the assessment of prevalence studies in the future, they identify the Joanna Briggs Institute Prevalence Critical Appraisal Tool as the best of what's currently available (downloadable from a list of JBI checklists here).

Migliavaca, C.B., Stein, C., Colpani, V., Munn, Z., Falavigna, M., and the Prevalence Estimates Reviews - Systematic Review Methodology Group (PERSyst). (2020). J Clin Epidemiol 127:59-68.

Manuscript available at the publisher's website here.

Tuesday, October 13, 2020

Equity Harms Related to Covid-19 Policies: Slowing the Spread Without Increasing Inequity

Since COVID-19 was first declared a pandemic in March of this year, numerous policies around the world have implemented some degree of lockdown, slashing social events and gatherings, shuttering once-bustling businesses and changing the face of the global economy. While the lockdowns in place were likely necessary to reduce the infection rate and resulting morbidity and mortality associated with the coronavirus, there are potentially undesirable consequences of these policies that affect measures of equity. In a new publication, Glover and colleagues present a framework for considering these effects and weighing them against the benefits of slowing the spread.

The work builds off of a novel combination of two existing frameworks. First, the Lorenc and Oliver framework lays out five potential harms of public health interventions which require mitigation: direct health harms, psychological harms, equity harms, group and social harms, and opportunity costs. Second, the PROGRESS-Plus health equity framework provides a list of 11 general categories that can affect measures of equity: Place of residence, Race, Occupation, Gender/sex, Religion, Education, Socioeconomic status, Social Capital, sexual orientation, age, and disability. Each of the two frameworks' individual components are used as a lens to examine the other. The resulting matrix of 55 potential sources of inequity related to the COVID-19 pandemic and its resulting public health policies provides an exemplary approach to considering all aspects of any large-scale public health intervention and the impact its implementation may have on inequity.

Key to the authors' resulting framework is the concept that both the policy responses to the pandemic and the nature of the pandemic itself are potential sources of inequity. For instance, individuals in lower-income occupations are also typically considered essential workers, and are less likely to have a safety net that would allow them to choose not to work or a job that is compatible with working remotely. Thus, the existing systemic inequities are exacerbated by the fact that they are now more likely to be exposed to the virus by continuing to go to work outside the home. However, policymakers can help reduce the impact of their policies on these sources of inequity - as well as ones caused by lockdown policies more directly - by considering mitigation strategies when implementing these policies (for example, by mandating improved sanitation, personal protective equipment, and social distancing for workers in vulnerable occupations). The figure below from the publication provides an overview of the relationship between the pandemic, policy responses and their resulting inequities, and potential points of intervention.

Click to enlarge.

The framework also allows for a more nuanced consideration of context in efforts to reduce the spread of coronavirus. Policies that are highly effective and viable in higher-income countries or areas with greater population density, for instance, may not be as beneficial in low- and middle-income countries and may even result in greater inequity. As with any intervention of any scale, the potential harms must be weighed against the desirable effects, and the context of the given intervention is key. This framework allows for consideration of a wider range of impacts when attempting to reduce illness and mortality in the age of a pandemic.

Glover, R.E., van Schalkwyk, M.C.I., Akl, E.A., Kristjannson, E., Lofti, T., Petkovic, J., ... & Welch, V. (2020). A framework for identifying and mitigating the equity harms of COVID-19 policy interventions. J Clin Epidemiol 128:35-48.

Manuscript available from the publisher's website here. 

Tuesday, October 6, 2020

New Study Examines the Impact of Abbreviated vs. Comprehensive Search Strategies on Resulting Effect Estimates

It's common practice - indeed, it's widely recommended - that systematic reviewers search multiple databases in addition to alternative sources of data such as the grey literature to ensure that no relevant studies are left out of analysis. However, meta-research on whether this theory holds up in practice is mainly limited to examinations of recall - in other words, reporting how many potentially relevant studies are picked up by an abbreviated search method as opposed to one that's more extensive. What's missing from this body of research, write Ewald and colleagues in a newly published study, is that recall studies compare items retrieved in absolute terms without considering the final weight or importance of each individual study - variables which will ultimately affect the direction, magnitude, and precision of the resulting effect estimate. Since larger studies with more cach√© are likely to have the greatest impact on the final estimate and certainty of evidence - and these studies are more likely to be picked up in even an abbreviated search - the added value of utilizing more extensive search strategies on a meta-analysis is left unclear.

To examine the impact of the extensiveness of a search strategy on resulting findings and certainty of evidence, the authors randomly selected 60 Cochrane reviews from a range of disciplines for which certainty of evidence assessments and summaries of findings were available. Thirteen reviews did not report at least one binary outcome, leaving a total of 47 for analysis. They then replicated these reviews' search strategies in addition to conducting 14 abbreviated searches for each review (e.g., MEDLINE only), such as limiting to one database or a combination of just two or three (e.g., MEDLINE and Embase only). Finally, meta-analyses were replicated for each of these scenarios, leaving out studies that would not have been picked up in the various abbreviated search strategies. 

Searching only one database led to a loss of at least one trial in half of the reviews, and a loss of two trials in one-quarter of them. As may be expected, the use of additional databases reduced the loss of information. Overall, however, the direction and significance of the resulting effect estimates remained unchanged in a majority of the cases, as shown in Figure 1 from the paper, below.

Click to enlarge.

The use of abbreviated searches did, however, introduce some amount of imprecision, typically increasing standard error by around 1.02 to 1.06-fold. The inclusion of multiple versus a single database did not clearly appear to improve precision compared to a comprehensive search.

The authors note that these findings are particularly applicable to authors of potential rapid reviews and guidelines, where a consideration of trade-offs between speed and thoroughness is of great importance. Rapid reviewers should be aware that limiting search strategy may change the direction of an effect estimate or render an effect estimate uncalculable in up to one in seven instances, but this should be weighed against the benefits of a quicker time to the dissemination of findings, especially during emergent health crises where time is of the essence.

Ewald, H., Klerings, I., Wagner, G., Heise, T.L., Dobrescu, A.I., Armijo-Olivo, S., ... & Hemkens, L.G. (2020). Abbreviated and comprehensive literature searches led to identical or very similar effect estimates: A meta-epidemiological study. J Clin Epidemiol 128:1-12.

Manuscript available from publisher's website here.