- In the methods rather than only in tables or supplementary material, explicitly state the sample population from which the functional outcomes were drawn, whether it's survivors-only or another type of analysis.
- If a survivors-only analysis is used, the authors should report the baseline characteristics between the groups analyzed and transparently discuss this as a limitation within the discussion section.
- If all randomized participants are analyzed regardless of mortality, authors should report the assumptions upon which these analyses are based; for instance, if death is one outcome ranked among others in a worst-rank analysis, the justification for the ranking of outcomes should be discussed in the methods, and the implications of these decisions included in the discussion section.
Wednesday, April 21, 2021
In Studies of Patients at High Risk of Death, More Explicit Reporting of Functional Outcomes is Needed
Thursday, April 8, 2021
A key tenet underlying the GRADE framework is that the certainty of available research evidence is a key factor to be considered in the course of clinical decision-making. But what if little to no published research exists off of which to base a recommendation? At the end of the day, clinicians, patients, policymakers, and others will still need to make a decision, and will look to a guideline for direction. Thankfully, there are other options to pursue within the context of a systematic review or guideline that ensures that as much of the available evidence is presented as possible, although it may be from less traditional or direct sources.
A new project conducted by the Evidence-based Practice Center (EPC) Program of the Agency for Healthcare Research and Quality (AHRQ) developed guidance for supplementing a review of evidence when the available research evidence is sparse or insufficient. This guidance was based on a three-pronged approach, including:
- a literature review of articles that have defined and dealt with insufficient evidence,
- a convenience sample of recent systematic reviews conducted by EPCs that included at least one outcome for which the evidence was rated as insufficient, and
- an audit of technical briefs from the EPCs, which tend to be developed when a given topic is expected to yield little to no published evidence and which often contain supplementary sources of information such as grey literature and expert interviews.
- Reconsider eligible study designs: broaden your search to capture a wider variety of published evidence, such as cohort or case studies.
- Summarize evidence outside the prespecified review parameters: use indirect evidence that does not perfectly match the PICO of your topic in order to better contextualize the decision being presented.
- Summarize evidence on contextual factors (factors other than benefits/harms): these include key aspects of the GRADE Evidence-to-Decision framework, such as patient values and preferences and the acceptability, feasibility, and cost-effectiveness of a given intervention.
- Consider modeling if appropriate, and if expertise is available: if possible, certain types of modeling can help fill in the gaps and make useful predictions for outcomes in lieu of real-life research.
- Incorporate health system data: "real-world" evidence such as electronic health records and registries can supplement more mechanistic or explanatory RCTs.
Friday, April 2, 2021
As opposed to an "explanatory" or "mechanistic" randomized controlled trial (RCT), which seeks to examine the effect of an intervention under tightly controlled circumstances, "pragmatic" or "naturalistic" trials study interventions and their outcomes when used in more real-world, generalizable settings. One example of such a study might include the use of registry data to examine interventions and outcomes as they occur in the "real world" of patient care. However, there are currently few standards for identifying, reporting, and discussing the results of such "pragmatic RCTs." A new paper by Nicholls and colleagues aims to provide an overview of the current landscape of this methodological genre.
The authors searched for and synthesized 4,337 trials using keywords such as "pragmatic," "real world," "registry based," and "comparative effectiveness" to better map an understanding of how pragmatic trials are presented in the RCT literature. Overall, only about 22% (964) of these trials were identified as "pragmatic" RCTs in the title, abstract, or full text; about half of these (55%) used this term in the title or abstract, while the remaining 45% described the work as a pragmatic trial only in the full text.
About 78.1% (3,368) of the trials indicated that they were registered. However, only about 6% were indexed in PubMed as a pragmatic trial, and only 0.5% were labeled with the MeSH topic of Pragmatic Clinical Trial. The target enrollment of pragmatic trials was a median of 440 participants within an interquartile range (IQR) of 244 to 1,200; the actual achieved accrual was 414 (IQR: 216 - 1,147). The largest trial included 933,789 participants; the smallest enrolled 60.
Overall, pragmatic trials were more likely to be centered in North America and Europe and to be funded by non-industry sources. Behavioral, rather than drug or device-based, interventions were most common in these trials. Not infrequently, the trials were mislabeled or contained erroneous data in their registration information. The fact that only about half of the sample were clearly labeled as "pragmatic" may mean that these trials may go undetected with less sensitive search mechanisms than the authors used.
Authors of pragmatic trials can improve the quality of the field by clearly labelling their work as such and by registering their trials and ensuring that registered data are accurate and up-to-date. The authors also suggest that taking a broader view of what constitutes a "pragmatic RCT" also generates questions regarding proper ethical standards when research is conducted on a large scale with multiple lines of responsibility. Finally, the mechanisms used to obtain consent in these trials should be further examined in light of the finding that many pragmatic trials fail to achieve goals set for participant enrollment.
Manuscript available from publisher's web site here.
Nicholls SG, Carroll K, Hey SP, et al. (2021). A review of pragmatic trials found a high degree of diversity in design and scope, deficiencies in reporting and trial registry data, and poor indexing. J Clin Epidemiol (ahead of print).
Monday, March 15, 2021
While the use of blinding is a hallmark of placebo-controlled trials, whether the blinding was successful - i.e., whether or not participants were able to figure out the treatment condition to which they have been assigned - isn't always tested, nor are the results of these tests always reported. The measurement of the success of blinding in trials is controversial and not uniformly used, and the item has been dropped from subsequent versions of the CONSORT reporting items for trials. According to a recent discussion of the pros and cons to measuring the success of blinding, only between 2-24% of trials perform or report these types of tests.
As Webster and colleagues explain, the benefits to measuring the success of blinding are as follows:
- the success (or failure) of blinding in a placebo-controlled trial can introduce a source of bias that affects the results.
- while the effect of blinding itself may be small, these small effects could still result in changes to policy or practice
- there are documented instances in which the failure to properly blind (for instance, providing participants with a sour-tasting Vitamin C condition versus a sweet lactose "placebo") led to an observed effect (for instance, on preventing or treating the common cold) whereas there was no effect in the subgroup of participants who were successfully blinded.
- At times, a break in blinding can lead to conclusions in the opposite direction. For instance, physicians who are unblinded may assume that the patients with better outcomes received a drug widely supposed to be "superior," when in fact, the opposite occurred.
- In some cases, a treatment with dramatically superior results can result in unblinding, even when the treatment conditions were identical - but that doesn't necessarily mean the blinding was a failure or could have been prevented, given the dramatic differences in outcomes.
- If the measurement of blinding is performed at the wrong time - such as before the completion of the trial - participants may become suspicious and this in itself could potentially confound treatment effects.
Tuesday, March 9, 2021
To guide the formulation of clinical recommendations, GRADE relies on the use of direct or, if necessary, indirect evidence from peer-reviewed publications as well as the gray literature. However, in some cases, no such evidence may be found even after an extensive search has been conducted. A new paper - part of the informal GRADE Notes series in the Journal of Clinical Epidemiology - relays the results of piloting an "expert evidence" approach and provides key suggestions when using it.
As opposed to simply asking the panel members of a guideline to base their recommendations off of informal opinion, the expert evidence approach systematizes this process by eliciting the extent of their experience with certain clinical scenarios through quantitative survey methods. In this example, at least 50% of the panel members were free of conflicts of interest, with various countries and specialties represented. While members were not required to base their answers off of patient charts, the authors suggest that this can be used to further increase the rigor of the survey.
As a result of the survey, the recommendations put forward reflected a cumulative 12,000 cases of experience. Because the members felt that at least some recommendation was necessary to help guide care - where the alternative would be to provide no recommendation at all - the guideline helped to fill a gap while indicating the current lack of high-quality published evidence for several clinical questions, which may help guide the production of higher-quality evidence and recommendations in the future. Importantly, by utilizing a survey approach to facilitate the formulation of recommendations, the authors note that it avoided the pitfall of "consensus-based" approaches to guideline development which can often manifest as simply reflecting the opinions of those with the loudest voices.
Mustafa RA, Cuello Garcia CA, Bhatt M, Riva JJ, Vesely S, Wiercioch W, ... & HJ Schünemann. (2021). How to use GRADE when there is "no" evidence? A case study of the expert evidence approach. J Clin Epidemiol, in-press.
Manuscript available from the publisher's website here.
Wednesday, March 3, 2021
When meta-analyzing data from studies examining the incidence of rare events - or those with a small sample size or short follow-up period, it is not uncommon to come across a study with 0 events of the outcome of interest. In fact, approximately one-third of a random sample of 500 Cochrane reviews contained at least one zero-events study.
Zero-events studies are typically categorized as single-arm (there are 0 events reported in just one group) or double-arm (there are 0 events reported in both groups). While some software automatically discard double-arm zero-events studies from a meta-analysis, this is not ideal because these data still add useful information in regards to the overall effect of an intervention. Ideally, meta-analyses could include a pooled event count that may be zero in one arm, both arms, or neither, with various single-arm and double-arm zero-events studies potentially contributing to this final effect. Thus, in a recently published article, Xu and colleagues propose a more detailed framework for approaching zero-events studies in the context of a meta-analysis.
The authors describe six classifications as follows, with the degree of difficulty when meta-analyzing generally increasing from 1 to 6:
Thursday, February 25, 2021
While the GRADE framework is used by over 100 health organizations to assess the certainty of evidence and guide the formulation of clinical recommendations, its use in the field of nutrition for these purposes is still sparse. A recent examination of all systematic reviews using GRADE in the ten highest-impact nutrition journals over the past five years provides insight and suggestions for moving the field forward in the use of GRADE for evidence assessment in systematic reviews of nutritional interventions.
Werner and colleagues identified 800 eligible systematic reviews, 55 (6.9%) of which used GRADE, and 47 (5.9%) of which rated the certainty of evidence specific to different outcomes. The number of these reviews using GRADE increased year-to-year, from two in 2015 to 23 in 2019. Reviews claiming to use a modification of GRADE were excluded from analysis.
Of the 811 identified cases of downgrading the certainty of evidence, and 31 cases of upgrading. Reviews of randomized controlled trials had a mean number of 1.6 domains downgraded per outcome, while reviews of non-randomized studies had a mean of 2.1. In about 6.5% of upgrading cases, this was done for unclear purposes not in line with GRADE guidance, such as upgrading for low risk of bias, narrow confidence intervals, or very low p-values. Reviews of non-randomized studies were more likely to have outcomes downgraded for imprecision and inconsistency, and less likely to have downgrades for publication bias than those of randomized studies.
The authors conclude that while the use of GRADE in systematic reviews of nutritional interventions has grown over recent years based on this sample, continued education and training of nutrition researchers and experts can help improve the spread and quality of the application of GRADE to assess the certainty of evidence in this discipline.
Werner SS, Binder N, Toews I, et al. (2021). The use of GRADE in evidence syntheses published in high-impact-factor nutrition journal: A methodological survey. J Clin Epidemiol, in-press.
Manuscript available here.
Friday, February 19, 2021
Registration of Trials Included in Systematic Reviews Has Improved Over Time, but Remains Under 50% for Most Years
The prospective registration of a randomized controlled trial (RCT) can reduce bias by clearly laying out the methods to be used before the research is conducted and data analyzed. Registration can also help limit unintentional duplication of efforts, which can be seen as an ethical charge, as duplication of research may mean duplication of unnecessary potential risks to participants and a failure to properly disseminate the findings of research. In 2004, the International Committee of Medical Journal Editors (ICMJE) recommended that journals only consider publishing results of a trial that was prospectively registered.
A new study by Lindsley and colleagues set out to examine just how many RCTs included in a sample of systematic reviews were properly registered, and if so, whether the registration entry was updated with results of the trial. From a group of 618 systematic reviews published within the Cochrane Musculoskeletal, Oral, Skin and Sensory (MOSS) network between 2014 and 2019, between a total of 100 eligible reviews were randomly selected from each of the network's eight groups (30 from the Eyes and Vision group, which provided the pilot data, and ten from each of the remaining seven).
Among a total of 1,432 included trials published since 2000 when the protocol repository clinicaltrials.gov became available, only 379 (26%) had been registered. Of those 1,177 trials published since 2005, when the ICMJE recommendation first went into effect, the proportion of registered trials increased to 31%, and then to 38% for those published since 2010. Registered trials had double the median number of patients (120) than non-registered trials (60). About one-third (31%) of the trials published since 2005 included at least one major outcome within the registry record. While trial registration did seem to increase over time, only during two years - 2015 and 2018 - did the proportion of registered to nonregistered trials exceed 50%.
Overall, the authors found that while trial registration has become more common since 2005, it still tends to make up the minority of trials included within systematic reviews of this area. In addition, only about 10% of trials examined had updated the registration record with results related to safety or efficacy. Much room for improvement remains in terms of increasing and incentivizing the prospective registration of trials and update of information with publicly available results.
Lindsley K, Fusco N, Teeuw H, et al. (2021). Poor compliance of clinical trial registration among trials included in systematic reviews: A cohort study. J Clin Epidemiol 132:79-87.
Manuscript available here.
Friday, February 12, 2021
Scoping reviews provide an avenue for the exploration, description, and dissemination of a body of evidence before a more systematic review is undertaken. As such, they can help clarify how research on a certain has been defined and conducted, in addition to identifying common issues and knowledge gaps - all of which can go on to inform a more effective approach to systematically reviewing the literature.
The Joanna Briggs Institute (JBI) has provided guidance on the conduct of scoping reviews since 2013. While developing the latest version published in 2020, the group identified the most common challenges and posed some solutions for those looking to develop a scoping review.
Key challenges included:
- a lack of people trained in methodology unique to scoping reviews (helpful resources can be found on the JBI Global page and elsewhere).
- how to decide when a scoping review is appropriate (hint: they should never be done in lieu of a systematic review if the intention is to provide recommendations)
- deciding which type of review is most appropriate (this online tool can help)
- knowing how much and what type of data to extract - for instance, making determinations between "mapping" of concepts around particular areas, populations, or methodologies and conducting a qualitative thematic analysis
- reporting results effectively, such as with an evidence gap map
- resisting the urge to overstate conclusions and provide recommendations for practice
- a lack of editors and peer reviewers adequately trained to critically revise scoping reviews (the PRISMA extension for scoping reviews - PRISMA ScR - provides a checklist for proper conduct and reporting).
Monday, February 1, 2021
Patient versions of guidelines (PVGs) can provide crucial information about diagnoses and management options to patients in clear, plain language and can help guide shared decision-making between patients and their providers to improve the quality of care. However, the construction and reporting of PVGs is variable in terms of quality and content. Now, a new extension of the Reporting Tool for Practice Guidelines in Health Care - the RIGHT-PVG - aims to standardize the development of such documents.
Development of the RIGHT-PVG comprised 17 experts from around the world with experience in guideline development, patient communication, and epidemiology, and clinical practice. First, an initial list of items was generated from common themes in a sample of 30 PVGs. Then, four organizational guidance documents for the development of PVGs were identified and used to refine initial criteria. Two rounds of a modified Delphi consultation were used to further pare and refine checklist items from an original list of 45, with all panelist feedback anonymized.
Final items included within the RIGHT-PVG fell under four main categories:
- Basic information: items 1-3 include the reporting of title and copyright, contact information, and a general summary of the PVG's key points.
- Background: items 4-6 include a general introduction to the topic at hand, information about the scope and target audience of the document, and a link to the original guideline off of which the PVG is based.
- Recommendations: items 7 and 8 comprise the meat of the PVG: what is the guideline recommending, for whom, and what are the potential desirable and undesirable effects of the intervention?
- Recommendations should be easily identifiable via boxing, shading/coloring, or bold type.
- The strength of each recommendation should be included along with a transparent reporting of the certainty of the evidence behind it.
- Easy-to-understand symbols can be used to denote the differences between strong and more conditional recommendations.
- Other information: items 9-12 recommend the inclusion of suggested questions for the reader to ask their provider; a glossary of terms and abbreviations; information about how the guideline was funded; and disclosure of any relevant conflicts of interest.
Tuesday, January 26, 2021
The biggest benefit of a meta-analysis is that is allows multiple studies' findings to be pooled into a single effect estimate, raising the statistical power of the test and potentially raising our certainty in the effect estimate in turn. However, a single estimate may be misleading if there is significant heterogeneity (or inconsistency in GRADE terminology) among the individual studies. One study, for instance, may point to a potential harm of an intervention while the others in the same meta-analysis suggest a benefit; this study may vary from the others in important ways regarding its population, the performance of the intervention, or even the study design itself. A brief primer on heterogeneity newly published by Cordero and Dans details how it can be identified and managed to improve the way implications of a meta-analysis are presented and applied.
Of eyeballs and i: detecting heterogeneity
Identifying the presence of heterogeneity among a group of pooled studies may be as simple as visually inspecting a forest plot for confidence intervals that show poor overlap or discordance in their estimate of effects (i.e., some showing a likely benefit while others showing a likely harm).
However, some statistical analyses can also provide more nuanced and objective measures of potentially worrisome heterogeneity:
- the Q statistic, which tests the null hypothesis that no heterogeneity is present and provides a p-value for this likelihood (however, large p-values should not necessarily be interpreted as the absence of heterogeneity).
- i is a measure based on the Q statistic and can be interpreted generally as the amount of total variability within the sample that is due to differences between studies. The larger the i , the greater the likelihood of "real" heterogeneity. A 95% confidence interval surrounding the estimate should be presented when using i to detect heterogeneity.
Wednesday, January 20, 2021
It is not uncommon for a health guideline to compare two or more interventions against one another. However, while sophisticated statistical approaches such as network meta-analyses allow us to compare these interventions head-to-head in terms of specified health outcomes, they do not take other important aspects of clinical decision-making into account, such as patient values and preferences, resource use, and equity considerations. A new paper from Piggott and colleagues aims to provide initial suggestions for using the GRADE evidence to decision (EtD) framework when choosing which of multiple interventions to recommend.
The authors identified a need for more direction when undertaking a multiple intervention comparison (MC) approach while working on recently released guidelines for the European Commission Initiative on Breast Cancer in which multiple screening intervals were compared against one another. Based on this experience, the group drafted a flexible yet transparency-minded framework to help guide similar efforts in the future, which was then added as a module in GRADE's official guideline development software, GRADEpro.
The new module was pilot-tested for feasibility with several additional guidelines. The module allows the user to select and then compare multiple pairwise comparisons against one another (for instance, with one column for "Intervention 1 vs. Comparator 1" and "Intervention 2 vs. Comparator 2"). A five-star system is used to judge various components of the EtD, such as cost effectiveness, for each individual intervention and comparator, whereas a column on the right-hand side allows the user to input the relative importance of these components in decision-making.
Finally, the user can review all judgments across interventions and summatively recommend the most favorable intervention(s) overall.
Piggott T, Brozek J, Nowak A, et al. (2021). Using GRADE evidence to decision frameworks to choose from multiple interventions. J Clin Epidemiol 130:117-124.
Manuscript available from the publisher's website here.
Thursday, January 14, 2021
Need for Speed Pt. II: Combining Automation and Crowdsourcing to Facilitate Systematic Review Screening Process
Last year, we discussed a 2017 article detailing the ways that machine learning and automation can potentially expedite the typically lengthy process of a systematic review. Now, a new study published in the February 2021 issue of the same journal describes recent efforts to apply a combination of machine learning and crowdsourcing to improve the item screening process in particular.
Clark and colleagues combined machine and human efforts to facilitate the screening of potentially relevant randomized controlled trials (RCTs) for a Cochrane review using a modified version of Cochrane's Screen4Me program. First, the Cochrane-built "RCT Classifier" was used to automatically sift through all items, discarding them as "Not an RCT" or marking them as potentially relevant ("Possible RCT"). Then, crowd-sourcing was used to further identify eligible RCTs from the latter group.
In addition to having all participants partake in a mandatory training module before contributing to the crowdsourced screening efforts, the model also improves accuracy by using an 'agreement algorithm" which requires, for instance, that each item receives four consecutive votes in agreement (either exclusion or inclusion) before achieving a final classification.
The authors then compared the sensitivity and specificity of this system compared to a review completed based on the same search using the gold standard of all-human, independent, and duplicate screening methods. They also calculated the crowd's autonomy, defined as the proportion of records that required a third reviewer for resolution. To increase information gleaned, the authors allowed records to be re-introduced into the system and re-screened by different screeners (a "second batch" screening).
Screeners had 100% sensitivity in both batches, meaning that all potentially relevant items were correctly identified. Specificity - the proportion of correctly discarded non-relevant items - was 80.71% in the first batch but decreased to 62.43% the second time around. Autonomy was 24.6%, meaning just under a quarter of all items required resolution during their first time through the system. When reintroduced, this number increased to 52.9%, though the authors suggest this number may have decreased if the study were continued.
The authors conclude that although the machine aspect of this method - the RCT identifier - only contributed about 9.1% of the workload, the effectiveness of human crowdsourcing to facilitate the screening process was encouraging. Notably, the 100% sensitivity rate in both batches demonstrates that crowdsourcing is unlikely to wrongfully exclude relevant items from a systematic review. Furthermore, the use of a third resolver - as opposed to automatically assigning all conflicting items to the "potential RCT" group - ultimately contributed substantially to the reduction in workload.
Noel-Storr A, Dooley G, Affengruber L, and Gartlehner G. (2021). Citation screening using crowdsourcing and machine learning produced accurate results: Evaluation of Cochrane's modified Screen4Me process. J Clin Epidemiol 130:23-31.
Manuscript available from the publisher's website here.
Friday, January 8, 2021
New Guideline Participation Tool Lays Out Roles and Responsibilities for New and Returning Guideline Group Members
Guideline development groups should contain a multidisciplinary panel of experts and key stakeholders to ensure the quality, relevance, and ultimate implementation of resulting recommendations. However, there are few tools in existence to ensure the effective participation of panel members when working to draft guidelines, and preparing panel members with little to no previous experience in guideline development can be an especially daunting task. A new paper published in next month's issue of the Journal of Clinical Epidemiology aims to provide a tool to guide these efforts, with a specific focus on guideline developed using the GRADE framework.
To develop the tool, Piggott and colleagues first established a draft tool that included 61 items based on a previously published systematic review of guideline development handbooks. They then conducted a series of ten key informant interviews comprising both past and prospective guideline development group members to narrow the tool down to three major themes: selection of participants, guideline group process, and tool format. The resulting 33-item Guideline Participant Tool (GPT) was then validated in a survey of 26 guideline group members from various societies including WHO and the American Society of Hematology (ASH). The tool itself breaks the process of guideline participation into three major time windows:
- Before (Preparations): 12 items including clarifying objectives and one's role within the group and familiarizing oneself with the guideline development methodology to be used.
- During (Meetings): 15 items including avoiding undue interruptions, adhering to the specified methodology, and referring to the PICO question at hand as a way to stay on task.
- After (Follow-up): 6 items including maintaining proper confidentiality of information discussed, reviewing meeting minutes to identify any discrepancies in a timely fashion, and assisting with the promotion, dissemination, and evaluation of the guideline as requested.
According to the authors, "Most participants found that the tool is most useful before guideline group meetings explaining what to expect at each phase. Participants thought that the tool was useful beforehand as a reference for orienting themselves to the structure of meetings, understanding the guideline development process, and what might be required of them. Respondents agreed that the tool serves as a reference for them to stay on track with the required tasks and to support structuring the process of guideline development."
The authors go on to suggest that the tool be used as required reading for all group members ahead of their participation on a panel.
Piggott T, Baldeh T, Akl EA, et al. 2021. Supporting effective participation in health guideline development groups: The Guideline Participant Tool. J Clin Epidemiol 130:42-48.
Manuscript available from the publisher's website here.