Wednesday, July 1, 2020

A Not-So-Non-Event?: New Systematic Review Finds Exclusion of Studies with No Events from a Meta-Analysis Can Affect Direction and Statistical Significance of Findings

Studies with no events in either arm have been considered non-informative within a meta-analytical context, and thus have been left out of these analyses. A new systematic review of 442 such meta-analyses, however, reports that this practice may actually affect the resulting conclusions.

In the July 2020 issue of the Journal of Clinical Epidemiology, Xu and colleagues report their study of meta-analyses of binary outcomes in which at least one included study had no events in either arm. The authors then reanalyzed the data from 442 included papers taken from the Cochrane Database of Systematic Reviews, using modeling to determine the effect of reincorporating the excluded study.

The authors found that in 8 (1.8%) of the 442 meta-analyses, inclusion of the previously excluded studies changed the direction of the pooled odds ratio (“direction flipping”). In 12 (2.72%) of the meta-analyses, the pooled odds ratio (OR) changed by more than the predetermined threshold of 0.2. Additionally, in 41 (9.28%) of these studies, the statistical significance of that findings changed when assuming a p = 0.05 threshold (“significance flipping”). In most of these 41 meta-analyses, excluded (“non-event”) studies made up between 5 and 30% of the total sample size. About half of these alterations led to an expansion of the confidence interval; while in the other half, the incorporation of non-events reduced the confidence interval.

The figure above from Xu et al. shows the proportion of studies reporting no events within the meta-analyses that showed a substantial change in p value when these studies were included. The proportion of the total sample tended to cluster between 5 and 30%.

Post hoc simulation studies confirmed the robustness of these findings, and also found that exclusion of studies with no events preferentially affected the pooled ORs of studies that found no effect (OR = 1), whereas a large magnitude of effect was protective against these changes. The opposite was found for the effect of excluding studies with no events on the resulting p values (i.e., large magnitudes of effects were more likely to be affected whereas conclusions of no effect were protected).

In sum, though a common practice in meta-analysis, the exclusion of studies with no events in either arm may affect the direction, magnitude, or statistical significance of the resulting conclusions in a small but non-negligible number of analyses.

Xu, C., Li, L, Lin, L., Chu, H., Thabane, L., Zou, K., & Sun, X. Exclusion of studies with no events in both arms in meta-analysis impacted the conclusions. J Clin Epidemiol, 2020; 123: 91.99.

Manuscript available from the publisher's website here. 

Friday, June 26, 2020

CONSORTing with Incorrect Reporting?: Most Publications Aren’t Using Reporting Guidelines Appropriately, New Systematic Review Finds

Reporting guidelines such as PRISMA for systematic reviews and meta-analyses and CONSORT for randomized controlled trials are often touted as a way to improve the thoroughness and transparency of reporting in academic research. However, while intended as a guide for improving the reporting of research, a new systematic review of a random sample of different publication types found that in many cases, these guidelines were cited incorrectly as a way of guiding the design and conduct of the research itself, of assessing the quality of published research, or for an unclear purpose.

In the review published earlier this month, Caulley and colleagues worked with an experienced librarian to devise a systematic search strategy that would pick up on any publication citing one of four major reporting guidelines documents from inception to 2018: ARRIVE (used in in vivo animal research), CHEERS (used in health economic evaluations), CONSORT (used in randomized controlled trials) and PRISMA (used in systematic reviews and meta-analyses). Then, a random sample of 50 of each publication type were reviewed independently by two authors for their citation of the reporting guideline.

Overall, only 39% of the 200 reviewed items correctly stated that the guidelines were followed in the reporting of the study, whereas an additional 41% incorrectly cited the guidelines, usually by stating that they informed the design or conduct of the research. Finally, in 20% of the reviewed items, the intended purpose of the cited reporting guidelines was unclear.

Examples of appropriate, inappropriate, and unclear use of reporting guidelines provided by Caulley et al.
Between publication types, RCTs the most likely to appropriately cite the use of CONSORT guidelines (64%) versus 42% of economic evaluations correctly citing CHEERS, 28% of systematic reviews and meta-analyses appropriately discussing the use of PRISMA, and just 22% of in vivo animal research studies correctly citing ARRIVE.

In addition, the appropriate use of the reporting guidelines did not appear to increase as time elapsed since the publication of those guidelines.

The authors suggest that improved education about the appropriate use of these guidelines – such as the web-based interventions and tools that are available to those looking to use CONSORT - may improve their correct application in future publications.

Caulley, L., Catalá-López, F., Whelan, J., Khoury, M., Ferraro, J., Cheng, W., ... & Moher, D. Reporting guidelines of health research studies are frequently used in appropriately. J Clin Epidemiol, 2020; 122: 87-94. 

Manuscript available from the publisher's website here. 

Tuesday, June 23, 2020

Need for Speed: Documenting the Two-Week Systematic Review

In a recent post, we summarized a 2017 article describing the ways in which automation, machine learning, and crowdsourcing can be used to increase the efficiency of systematic reviews, with a specific focus on making living systematic reviews more feasible.

In a new publication in the May 2020 edition of the Journal of Clinical Epidemiology, Clark and colleagues incorporated automation in order to attempt systematic review that took no longer than two weeks from search design to manuscript submission for a moderately-sized search yielding 1,381 deduplicated records and eight ultimately included studies.

Spoiler alert: they did it. (In just 12 calendar days, to be exact).

Systematic Review, but Make it Streamlined

Clark et al. utilized some form of computer-assisted automation at almost every point in the project, including:
  • Using SRA word frequency analyzer to identify key terms that would be most helpful inclusions in a search strategy
  • Using hotkeys (custom keystroke shortcuts) within SRA Helper tool to more quickly screen items and search pre-specified databases for full texts
  • Using RobotReviewer to assist in risk of bias evaluation by searching for certain key phrases within each document

However, machines were only part of the solution. The authors also note the decidedly more human-based solutions that allowed them to proceed at an efficient clip, such as:
  • Daily, focused meetings between team members
  • Blocking off “protected time” for each team member to devote to the project
  • Planning for deliberation periods, such as decisions on screening conflicts, to occur immediately after screening so as to reduce the amount of time and energy devoted to “mental reload” and review of one’s previous decisions for context

All told, the final accepted version of the manuscript took 71 person-hours to complete – a far cry from a recently published average of 881 person-hours among conventionally conducted reviews.

Clark and colleagues discuss key facilitators and barriers to their approach as well as provide suggestions for technological tools to further improve the efficiency of SR production.

Clark, J., Glasziou, P., Del Mar, C., Bannach-Brown, A., Stehlik, P., & Scott, A.M. A full systematic review was completed in 2 weeks using automation tools: A case study. J Clin Epidemiol, 2020; 121: 81-90.

Manuscript avaliable from the publisher's website here.

Thursday, June 18, 2020

It’s Alive!: Pt. III: From Living Review to Living Recommendations

In recent posts, we’ve discussed how living systematic reviews (LSRs) can help improve the currency of our understanding of the evidence, as well as the efficiency with which the evidence is identified and synthesized through novel crowdsourcing and machine learning techniques. In the fourth and final installment of the 2017 series on LSRs, Akl and colleagues apply the LSR approach to the concept of a living clinical practice guideline.

As the figure below from the paper demonstrates, while simply updating an entire guideline more frequently (Panel B) reduces the number of out-of-date recommendations (symbolized by red stars) at any given time, it comes with a serious trade-off: namely, the high amount of effort and time required to continuously update the entire guideline. Turning certain recommendations into "living" models helps solve this dilemma between currency and efficiency.

Rather than a full update of an entire guideline and all of the recommendations therein, a living guideline uses each recommendation as a separate unit of update. Recommendations that are eligible to make the transition from “traditionally updated” to “living” include those that are a current priority for healthcare decision-making, for which the emergence of new evidence may change clinical practice, and for which new evidence is being generated at a quick rate.

The Living Guideline Starter Pack

Each step of a recommendation’s formation must make the transition to “living,” including:
  • A living systematic review
  • Living summary tables, such as Evidence Profiles and Evidence-to-Decision tables
  • Online collaborative table-generating software such as GRADEpro can be used to keep these up-to-date with the emergence of newly relevant evidence
  • A living guideline panel who can remain “on-call” to contribute to updates of recommendations with relatively short notice when warranted
  • A living pool of peer-reviewers who can review and provide feedback on updates with a quick turnaround time
  • A living publication platform, such as an online version that links back to archived versions, as well as “pushes” new versions to practice tools at the point of care.
Additional Resources
Further information and support for the development of LSRs, including updated official guidance, is provided on the Cochrane website.

Akl, E.A., Meerpohl, J. J., Elliott, J., Kahale, L. A., Schünemann, H.J., and the Living Sysematic Review Network. Living systematic reviews: 4. Living guideline recommendations. J Clin Epidemiol, 2017; 91: 47-53.

Manuscript available from the publisher's website here. 

Monday, June 15, 2020

It’s Alive! Pt. II: Combining Human and Machine Effort in Living Systematic Reviews

Systematic review development is known to be a labor-intensive endeavor that require a team of researchers dedicated to the task. The development of a living systematic review (LSR) that is continually updated as newly relevant evidence becomes available presents additional challenges. However, as Thomas and colleagues write in the second installment of the 2017 series on LSRs in the Journal of Clinical Epidemiology, we can make the process quicker, easier, and more efficient by harnessing the power of machine learning and “microtasks.”

Suggestions for improvements in efficiency can be categorized as either automation (incorporation of machine learning/replacement of human effort) or crowdsourcing (distribution of human effort across a broader base of individuals).

A diagram from Thomas et al. (2017) describes the "push" model of evidence identification that can help keep Living Systematic Reviews current without the need for repeated human-led searches.

From soup to nuts, opportunities for the incorporation of machine learning into the LSR development process include:

  • Continuous, automatic searches that “push” new potentially relevant studies out to human reviewers
  • Exclusion of ineligible citations through automatic text classification, reducing the number of items that require human screening with over 99% sensitivity
  • Crowdsourcing of study identification and "microtask" screening efforts such as Cochrane Crowd, which at the time of this blog’s writing had resulted in over 4 million screening decisions from over 17,000 contributors 
  • Automated retrieval of full text versions of included documents
  • Machine-based extraction of relevant data, graphs and tables from included documents
  • Machine-assisted risk of bias assessment
  • Template-based reporting of important items
  • Statistical thresholds that flag when a change of conclusions may be warranted
As technology in this field progresses, the traditionally duplicated stages of screening and data extraction may even be taken on by a computer-human pair, combining the ease and efficiency of automation with the “human touch” and high-level discernment that algorithms still lack.

Thomas, J.,  Noel-Storr, A., Marshall, I., Wallace, B., McDonald, S., Mavergames, C... & the Living Systematic Review Network. Living systematic reviews: 2. Combining human and machine effort. J Clin Epidemiol, 2017; 91: 31-37. 

Manuscript available from publisher's website here. 

Wednesday, June 10, 2020

It’s Alive! Pt. I: An Introduction to Living Systematic Reviews

As research output continues to rise, the systematic reviews charged with comprehensively identifying and synthesizing the evidence within them are becoming more quickly out-of-date. In addition, the formation of a systematic review team can be a lengthy process, and institutional memory of the project is lost when teams are disbanded after publication.

One solution to this problem is the concept of a living systematic review, or LSR. In the first installment of a 2017 series in the Journal of Clinical Epidemiology, Elliott and colleagues introduce the concept of an LSR and provide general guidance on their format and production.

What is a Living Systematic Review (LSR)?
An LSR has a few key components:
  • Based on a regularly updated search run with an explicit and pre-established frequency (at least once every six months) to identify any potentially relevant recent publications. 
  • Utilize standard systematic review methodology (different from a rapid review)
  • Most useful for specific topics:
    • that are of high importance to decision-making,
    • for which the certainty of evidence is low or very low (meaning our certainty of the effect may likely change with the incorporation of new evidence), and
    • for which new evidence is being generated often.

A figure from Elliott et al. (2017) provides an overview of the LSR development process, from protocol to regular searching and screening and incorporation and publication of new evidence.

LSRs from End to End
An LSR can either be started from scratch with the intention of regular screening and updating of evidence – in which case the protocol should specify these planned methods – or based upon an existing up-to-date systematic review, in which case the protocol should be amended to reflect these changes.

Due to their nature, the publication of LSRs requires the use of an online platform with linking mechanisms (such as CrossRef) or with explicit versions (such as the Cochrane database) that can be updated as soon as new evidence is incorporated.

When the certainty of evidence reaches a higher level, or if the generation of new evidence substantially slows, an LSR may be discontinued in favor of traditional approaches to updating.

Additional Resources
Further information and support for the development of LSRs, including updated official guidance, is provided on the Cochrane website.

Elliott, J.H., Synnot, A., Turner, T., Simmonds, M., Akl, E.A., McDonald, S... & Thomas, J. Living systematic review: 1. Introduction - the why, what, when, and how. J Clin Epidemiol, 2017; 91:23-30.

Manuscript available from the publisher's website here.

Friday, June 5, 2020

Research Revisited: 2014’s “Guidelines 2.0: Systematic Development of a Comprehensive Checklist for a Successful Guideline Enterprise”

While several checklists for the development and appraisal of specific guidelines had been developed by 2014, there had yet to be published a thorough and systematic resource for organizations to inform the actual day-to-day operations of a guideline development program. Noticing this need, Schünemann and colleagues pooled their professional experiences and contacts in the field in addition to conducting a systematic search for self-styled “guidelines for guidelines” and other guideline development handbooks, manuals, and protocols. The reviewers, in duplicate, extracted the key stages and processes of guideline development from each of these documents, compiling them together.

The result was the G-I-N/McMaster Guideline Development Checklist: an 18-topic, 146-item soup-to-nuts comprehensive manual spanning each part and process of a guideline development program, from budgeting and planning for a program to the development of actual guidelines to their dissemination, implementation, evaluation, and updating.
An overview of the steps and parties involved in the G-I-N/McMaster guideline development checklist.

The checklist also provides hyperlinks to tried-and-true online resources for many of these aspects, such as tips for funding a guideline program, tools for project management, topic selection criteria, and guides for patient and caregiver representatives.

Schünemann HJ, Wiercioch W, Etxeandia I, Falavigna M, Santesso N, Mustafa R, Ventresca M et al. Guidelines 2.0: Systematic development of a comprehensive checklist for a successful guideline enterprise. CMAJ 186(3): E123-E142.

Manuscript available for free here.

Tuesday, June 2, 2020

Research Shorts: Use of GRADE for the Assessment of Evidence about Prognostic Factors

In addition to questions of interventions and diagnostic tests, GRADE can also be used to assess the certainty of evidence when it comes to prognostic factors. In part 28 of the Journal of Clinical Epidemiology’s GRADE series published earlier this year, Foroutan and colleagues provide guidance for applying GRADE to a body of evidence of prognostic factors.

The Purpose of Prognostic Studies

GRADE may be applied to a body of evidence, separated by individual prognostic factors instead of outcomes, for one of two reasons. The first is a non-contextualized setting, such as when the certainty of evidence surrounding prognostic factors is being evaluated for application within research planning and analysis (e.g., determining which factors are best to use when stratifying for randomization). The second is a contextualized setting, when the certainty of evidence surrounding prognostic factors is used to help inform clinical decisions.

Establishing the Certainty of Evidence

Unlike when grading the certainty of evidence of an intervention, when assessing prognostic evidence, the overall certainty for observational studies starts out as HIGH. This is because the patient population is likely to be more representative studies than in RCTs, when eligibility criteria may place artificial restrictions on the characteristics of patients. Certainty may then be rated down based on the five traditional domains:
  • Risk of bias tools and instruments such as QUality In Prognosis Studies (QUIPS) and Prediction model Risk Of Bias ASsessment Tool (PROBAST) may be helpful here. When teasing out the effect of each potential factor, consider utilizing some form of multivariate analysis that accounts for dependence between several different prognostic factors.
  • Inconsistency can be examined via visual tests of the variability between individual point estimates and the overlap of confidence intervals; statistical tests such as i2 are likely to be less helpful, as they can often be inflated when large studies lead to particularly narrow Cis. As always, potential explanations for any observed heterogeneity should be considered a priori.
  • Imprecision will depend on whether the setting is contextualized, in which case it will depend on the relationship between the confidence interval and the previously set clinical decision threshold, or non-contextualized, in which case the threshold will most likely represent the line of no effect.
  • Indirectness should be based on a comparison of the PICOs for the clinical question at hand, and those addressed in the meta-analyzed studies.
  • Publication bias can be assessed via visually exploring a funnel plot or the use of appropriately applied statistical tests.
Foroutan F, Guyatt G, Zuk V, Vandvik PO, Alba AC, Mustafa R, Vernooij R et al. GRADE guidelines 28: Use of GRADE for the assessment of evidence about prognostic factors: Rating certainty in identification of groups of patients with different absolute risks. J Clin Epidemiol 121; 62-70.

Manuscript available from the publisher's website here. 

Thursday, May 28, 2020

Sink or SWiM? When and How to Use Narrative Synthesis in Lieu of Meta-Analysis

The terms “systematic review” and “meta-analysis” often go hand-in-hand. However, there are other ways to synthesize and present the findings of a systematic review that do not entail statistical pooling of the data. This is referred to as narrative synthesis or Synthesis Without Meta-Analysis (SWiM), and a recent webinar presented by Cochrane (viewable for free here) provided the definition, potential uses, and pitfalls to watch for when considering the use of narrative synthesis within a systematic review.

What is narrative synthesis/SWiM?

Narrative synthesis or Synthesis Without Meta-analysis (SWiM) is an approach used to describe quantitatively reported data from studies identified within a systematic review in a way that does not quantitatively pool or meta-analyze the data. Narrative synthesis is not the same as a narrative review, which is an unsystematic approach to gathering studies.

A narrative synthesis adds value to the literature by providing information about what the studies on a certain topic say as a whole, as opposed to simply summarizing the findings from individual studies one-by-one. Whereas a meta-analysis is useful in that it provides an overall estimate of the size of an effect of an intervention, a narrative synthesis allows the reviewer to organize, explore, and consider the ways that the findings from several studies are connected to as well as how they are different from one another – and the potential moderators that define these relationships. Thus, its focus is on the question of the existence, nature, and the direction of an effect, rather than its size.

When is it appropriate to perform a narrative synthesis/SWiM?

There are several reasons why narrative synthesis/SWiM may be used when reporting the findings of a systematic review.
·      There are not enough data to calculate standardized effect sizes. Meta-analyzing outcomes that are reported using different scales requires the standardization of these data. However, in certain fields, authors of studies may be less likely to report all of the elements required to calculate a standardized effect size, such as the measures of variance; contacting the authors to obtain this information may not yield the needed data. To exclude these studies outright, however, and meta-analyze only studies in which all the needed data are reported, may under- or misrepresent the entire body of evidence.
·      There is substantial heterogeneity among included studies. Notable inconsistency between studies with regards to their effect sizes and direction (statistical heterogeneity), the study design (methodological heterogeneity), or from clinical differences surrounding the PICO may render a quantitative meta-analysis of studies to be of little utility, especially if there is a small number of studies to be analyzed together. However, it’s important to ask yourself whether the heterogeneity is truly of enough concern to preclude meta-analysis. PICO elements should be carefully considered a priori as to which are similar enough to be pooled, and which require their own analysis.

What are some common errors made in narrative syntheses/SWiMs?
There are a few common piftalls to watch out for when deciding to report your synthesis without quantitative meta-analysis.
·      Not transparently reporting that a narrative synthesis was used when data could not be/were not meta-analyzed
·      Not reporting the methods used for narrative synthesis in detail
·      Not referring to methodological guidance when describing the decision to perform a narrative synthesis
·      Not providing clear links between the data and the synthesis, such as via tables or charts used to report the same data as in the text.

By improving the reporting and presentation of these items within a systematic review, end-users will be better able to understand the reasons why a narrative synthesis was conducted, and ultimately utilize the findings.

Guidance for the reporting of narrative synthesis, or SWiMs, can be found by using the new SWiM reporting guideline checklist here.

We recently reported on GRADE guidance for assessing the certainty of evidence in such circumstances as when a narrative synthesis is presented. More here.

Friday, May 22, 2020

Research Shorts: Assessing the Certainty of Diagnostic Evidence, Pt. II: Inconsistency, Imprecision, and Publication Bias

Earlier this week, we discussed a recent publication of the GRADE series in Journal of Clinical Epidemiology that provides guidance for assessing risk of bias and indirectness across a body of evidence of diagnostic test accuracy. In this post, we’ll follow-up with continued guidance (published in Part II) for the rest of the GRADE domains.


Unexplained inconsistency should be evaluated separately for the findings on test specificity and test sensitivity. When a meta-analysis is available, both the visual and quantitative markers of inconsistency can be used in a similar fashion to a meta-analysis of intervention studies. If differences between studies related to any of the PICO elements is suspected as an explanation for observed heterogeneity, exploration via subgroup analyses may be appropriate.


Again, imprecision of a test’s sensitivity and specificity should be evaluated separately. As with assessments of interventional evidence, evaluation of imprecision across a body of test accuracy studies entails the consideration of the width of the confidence interval as well as the number of events (specifically, the number of patients with the disease and the number of positive tests for sensitivity, and the number of patients without the disease and the number of negative tests for specificity).

In contextualized settings, when one end of the confidence interval may lead to the use of the testing strategy while the other end would not, then imprecision is likely present. It may be helpful to set priori a threshold through which a confidence interval should not cross in order for the test to have sufficient value.

Publication Bias

The use of traditional funnel plot assessments (e.g., Egger’s or Begg’s test) on a body of test accuracy studies is more likely to result in undue suspicion of publication bias than when applied to a body of therapeutic studies. While other sophisticated statistical assessments are available (e.g., Deeks’ test, trim and fill), systematic review and health technology assessment (HTA) authors may choose to base a judgment of publication bias on the knowledge of the existence of unpublished studies. If studies published by for-profit entities or those with precise estimates claiming high test accuracy despite small sample sizes exist, publication bias may also be suspected.

Upgrading the Certainty of Evidence ("Rating Up")

As with an assessment of interventional evidence, there may be reasons to upgrade the certainty of evidence in the face of highly convincing links between the use of a test and the likelihood and/or magnitude of an observed outcome. The diagnostic test accuracy equivalent of a dose-response gradient – the Receiving Operator Characteristic, or ROC curve – may be used to assess this potential upgrader.

Schünemann H, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, Bossuyt P, et al. GRADE guidelines 21 pt. 2. Test accuracy: inconsistency, imprecision, publication bias, and  other domains for rating the certainty of evidence and presenting it in evidence profiles and summary of findings tables. J Clin Epidemiol  2020 Feb 10. pii: S0895-4356(19)30674-2. doi: 10.1016/j.jclinepi.2019.12.021. [Epub ahead of print].

Manuscript available here on the publisher's site.

Tuesday, May 19, 2020

Research Shorts: Assessing the Certainty of Diagnostic Evidence, Pt. I: Risk of Bias and Indirectness

Systematic reviews or health technology assessments (HTAs) that examine the body of evidence on diagnostic procedures can - and should - transparently assess and report the overall certainty of evidence as part of their findings. In the two-part, 21st installment of the GRADE guidance series published in the Journal of Clinical Epidemiology, Schünemann and colleagues provide methods for approaching the first two major domains of the GRADE approach: risk of bias and indirectness.

While there are certainly differences between methods for assessing the certainty of evidence of diagnostic tests as opposed to interventions, the fundamental parts of GRADE remain unchanged:

Make Clinical Questions Clear via PICOs

It is paramount to clearly define the purpose or role of a diagnostic test and to see the test in light of its potential downstream consequences for making subsequent treatment decisions. As with a review of an intervention, a review of a diagnostic test should be built upon questions that define the Population, Intervention (the “index test” being assessed), Comparator (the “reference” test representing the current standard of care), and Outcomes (PICOs).

Prioritize Patient-Important Outcomes

Outcomes should be relevant to the population at hand. As such, the ideal study design to generate this evidence for outcomes related to test accuracy is a randomized controlled trial with a test-retest format that directly investigates the downstream effects of a testing strategy on outcomes in the population at hand, seen in Figure 1A below.

However, this is often not available. In this case, test accuracy would be used as a surrogate outcome, and test accuracy studies such as those in Figure 1B can be linked to additional evidence that examines the effect of downstream consequences of test results on patient-important outcomes. (More on that in a March 2020 blog post, here.)

Assessing Risk of Bias in Test Accuracy Studies

There are several important factors to consider when assessing a body of test accuracy studies for risk of bias. Potential issues with regard to risk of bias include:
·      Populations that differ from those intended to receive the test (e.g., in terms of disease risk)
·      Failure to compare the test in question to an independent reference/standard test in all enrolled patients (e.g., by using only a composite test)
·      Lack of blinding when ascertaining test results

The QUADAS-2 tool can be used to guide assessment of bias in these studies.

Use PICO to Guide Assessment of Indirectness

Lastly, as when evaluating intervention studies, indirectness can be assessed by determining whether the Population, Index test, Comparator/reference test, and Outcomes match those in the clinical question.

Schünemann H, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, Bossuyt P, et al. GRADE guidelines 21 pt. 1: Study design, risk of bias, and indirectness in rating the certainty across a body of evidence for test accuracy.  J Clin Epidemiol Feb 12. pii: S0895-4356(19)30673-0. doi: 10.1016/j.jclinepi.2019.12.020. [Epub ahead of print]

Manuscript available here on publisher’s site.

Thursday, May 14, 2020

Research Shorts: Calculating Absolute Effects for Time-to-Event Outcomes

Time-to-event (TTE) data provide information about whether a specific event occurs as well as the amount of time that passes before its occurrence. As such, TTE analyses can be particularly useful in the development of guidelines in fields such as oncology, where various diagnosis and treatment options can change the time-course of a disease and its consequences. A methodological systematic review of cancer-related systematic reviews, however, found that review authors often struggled to appropriately apply TTE data in terms of their absolute effect. A 2019 paper by Skoetz and colleagues provides guidance for applying these types of data to calculate absolute effects in the development of systematic reviews and guidelines.

Direct calculation of absolute effect
If the TTE data come from studies with a fixed length of follow-up period and individual participant data, a timepoint at which all participant data are available should be used to create a 2x2 table, and absolute effect calculated accordingly.  Most of the time, however, absolute effect will not be directly calculable. This is the case with studies that have staggered participant entry and variable length of follow-up, and no time-points at which all individual participant data are captured. In this scenario, an absolute effect can be estimated from the pooled hazard ratio and an assumed baseline risk can be used, or a regular risk difference calculated if events are rare.

Indirect calculation of absolute effect
To estimate baseline risk in the calculation of an absolute effect sizing using a hazard ratio, it is important to use the best estimate of the baseline risk of the population at hand. While data reported in individual clinical trials may be used, consider that they may be either artificially inflated (by means of enrolling patients at higher-than-average risk) or reduced from the true population risk (by means of excluding patients with comorbidities). Thus, it is preferable to obtain a baseline risk estimate through large-scale observational studies conducted in the population of interest with a low risk of bias. Using this type of data to estimate baseline risk is also more likely to result in a higher certainty of effect, depending on the size of the study.

If these options are not suitable, data from the survival curves of control groups within studies at low risk of bias may be used. If possible, utilize data from a middle time-point.

No matter how absolute effects are calculated, it is important to clearly and transparently report this information, including:
  • reporting how the baseline risks were estimated
  • using the same the numbers consistently – e.g, whether reporting number of patients with events or those who remain event-free
  • uniformly choosing one specific time point based on the studies used.
Because absolute effects are more easily understood and used within shared decision-making, these estimates should be provided within the abstract as well as the Summary of Findings table or Evidence Profile.

This figure provides an example of how absolute risk based on time-to-event data can be meaningfully communicated in a patient-facing graphic.

The paper provides further guidance on determining the certainty of evidence using TTE data, calculating absolute absolute effects for events such as mortality, providing graphical representations of the absolute effect, and calculating corresponding numbers needed to treat and median survival times to further aid decision-making.

Skoetz, N., Goldkuhle, M/, van Dalen, E.C. et al. GRADE guidelines 27: How to calculate absolute effects for time-to-event outcomes in summary of findings tables and Evidence Profiles. J Clin Epidemiol 118 (2020); 124-131.

Manuscript available from publisher’s website here.  

Monday, May 11, 2020

Adventures in Protocol Publication

As most reading this will know, a systematic review is no small feat. But while the complete project itself can feel intimidating at times, a well-planned systematic review is broken up into enough small parts to make each part feel manageable – and a lot like an accomplishment it itself. One such step that more and more authors are choosing to take is the publication of their protocol in a peer-reviewed journal.

As Evidence Foundation fellow, I have had the unique opportunity to lead the development of a systematic review and critical appraisal of physical activity guidelines in collaboration with members of the U.S. GRADE Network. After nearly 18 months of work, I’m happy to report that the first draft of the manuscript has been written – but I was given a sweet taste of this accomplishment earlier on when my protocol was published in January. Here’s what I learned through the process.

Reasons to Publish a Systematic Review Protocol (For the Good of Science)
·      Just as with clinical trials, the publication of a protocol for a systematic review alerts other researchers in the field to the work being conducted, thus reducing duplication of efforts.
·      Defining the goals and processes to be used in the systematic review before it’s conducted (a priori) likely reduces bias.
·      According to a 2017 study comparing reviews with and without a published protocol, reviews with published protocols were more likely to be thorough and transparent in their reporting of methods in the resulting review. (However, this may just be because those who are likely to publish a protocol are also more likely to be generally thorough and transparent… but if that’s the case, which side would you like to be on?)

Reasons to Publish a Systematic Review Protocol (For Your Own Good)
·      Set yourself up for success. Submitting a protocol to a peer-reviewed journal gives you an opportunity to resolve any issues and automatically improve the quality of your final review manuscript before you even press “submit.” That means less work at the end of the day, and likely a shorter time window from submission of your final review to its publication. For instance, my reviewers asked that I further elaborate and clarify the history and importance of physical activity guidelines, which ultimately strengthened the introduction to my SR.
·      Save yourself room. Going in-depth in your published protocol means you can spend less space on the methods section of your final review, leaving you with more room for the meat of the paper: the results and discussion sections. Simply discuss your methods more briefly and cite your published protocol for further reading (and, lest I forget to mention, citing yourself is the ultimate power move).
·      Grow your CV. By getting their protocol published, a young researcher can add a precious first-author citation to their vitae. These don’t grow on trees, and publishing a protocol is like a two-for-one deal.
·      Stay accountable. Publishing your protocol for the world to see may be just the motivation you need to finish the task – and quickly, now that everyone’s waiting to see the results!

Reasons Not to Publish a Protocol (and Just Stick to PROSPERO Instead)
·      Financial burden. Publishing is not usually a cheap endeavor, and unless you have additional support, charges and fees may be better spent on the final review.
·      Opportunity cost. Honestly consider how much additional time and psychic bandwidth it may take you to get a protocol published, from the drafting to the revisions and everything in between (like editing every reference with a fine-toothed comb). Is it time that you’d rather spend on working on the review?
·      Longer time to publish. As per the above, it’s possible that the work of publishing a protocol may protract the entire process. That same 2017 study found that the median time from the search to submission of a review for which a protocol had been published was 325 days, and 578 days to publication of the final document. This stands in contrast to the matched reviews for which a protocol was not published, which only took a median of 122 days to submission and 358 days to publication.

A (By All means Non-Exhaustive) List of Places to Publish a Systematic Review Protocol
·      BMJ Open
·      Cochrane Database of Systematic Reviews
·      Environment International
·      JBI Database of Systematic Reviews and Implementation Reports
·      Medicine
·      Systematic Reviews

If you’re adequately convinced after weighing the costs and benefits, dust off your PRISMA-P checklist (heads up: the journals above will need you to show how you’ve fulfilled each criterion) and get writing.