Friday, June 26, 2020

CONSORTing with Incorrect Reporting?: Most Publications Aren’t Using Reporting Guidelines Appropriately, New Systematic Review Finds


Reporting guidelines such as PRISMA for systematic reviews and meta-analyses and CONSORT for randomized controlled trials are often touted as a way to improve the thoroughness and transparency of reporting in academic research. However, while intended as a guide for improving the reporting of research, a new systematic review of a random sample of different publication types found that in many cases, these guidelines were cited incorrectly as a way of guiding the design and conduct of the research itself, of assessing the quality of published research, or for an unclear purpose.

In the review published earlier this month, Caulley and colleagues worked with an experienced librarian to devise a systematic search strategy that would pick up on any publication citing one of four major reporting guidelines documents from inception to 2018: ARRIVE (used in in vivo animal research), CHEERS (used in health economic evaluations), CONSORT (used in randomized controlled trials) and PRISMA (used in systematic reviews and meta-analyses). Then, a random sample of 50 of each publication type were reviewed independently by two authors for their citation of the reporting guideline.

Overall, only 39% of the 200 reviewed items correctly stated that the guidelines were followed in the reporting of the study, whereas an additional 41% incorrectly cited the guidelines, usually by stating that they informed the design or conduct of the research. Finally, in 20% of the reviewed items, the intended purpose of the cited reporting guidelines was unclear.

Examples of appropriate, inappropriate, and unclear use of reporting guidelines provided by Caulley et al. Click to enlarge.

















Between publication types, RCTs the most likely to appropriately cite the use of CONSORT guidelines (64%) versus 42% of economic evaluations correctly citing CHEERS, 28% of systematic reviews and meta-analyses appropriately discussing the use of PRISMA, and just 22% of in vivo animal research studies correctly citing ARRIVE.

Appropriate, Inappropriate, and Unclear Use of Reporting Guidelines, by Publication Type. Click to enlarge.
















In addition, the appropriate use of the reporting guidelines did not appear to increase as time elapsed since the publication of those guidelines.

The authors suggest that improved education about the appropriate use of these guidelines – such as the web-based interventions and tools that are available to those looking to use CONSORT - may improve their correct application in future publications.

Caulley, L., Catalá-López, F., Whelan, J., Khoury, M., Ferraro, J., Cheng, W., ... & Moher, D. Reporting guidelines of health research studies are frequently used in appropriately. J Clin Epidemiol, 2020; 122: 87-94. 

Manuscript available from the publisher's website here. 

Tuesday, June 23, 2020

Need for Speed: Documenting the Two-Week Systematic Review

In a recent post, we summarized a 2017 article describing the ways in which automation, machine learning, and crowdsourcing can be used to increase the efficiency of systematic reviews, with a specific focus on making living systematic reviews more feasible.

In a new publication in the May 2020 edition of the Journal of Clinical Epidemiology, Clark and colleagues incorporated automation in order to attempt systematic review that took no longer than two weeks from search design to manuscript submission for a moderately-sized search yielding 1,381 deduplicated records and eight ultimately included studies.

Spoiler alert: they did it. (In just 12 calendar days, to be exact).

Systematic Review, but Make it Streamlined

Clark et al. utilized some form of computer-assisted automation at almost every point in the project, including:
  • Using SRA word frequency analyzer to identify key terms that would be most helpful inclusions in a search strategy
  • Using hotkeys (custom keystroke shortcuts) within SRA Helper tool to more quickly screen items and search pre-specified databases for full texts
  • Using RobotReviewer to assist in risk of bias evaluation by searching for certain key phrases within each document
However, machines were only part of the solution. The authors also note the decidedly more human-based solutions that allowed them to proceed at an efficient clip, such as:
  • Daily, focused meetings between team members
  • Blocking off “protected time” for each team member to devote to the project
  • Planning for deliberation periods, such as decisions on screening conflicts, to occur immediately after screening so as to reduce the amount of time and energy devoted to “mental reload” and review of one’s previous decisions for context
Time Distribution of 12-Day Systematic Review by Task. Click to enlarge.

All told, the final accepted version of the manuscript took 71 person-hours to complete – a far cry from a recently published average of 881 person-hours among conventionally conducted reviews.

Clark and colleagues discuss key facilitators and barriers to their approach as well as provide suggestions for technological tools to further improve the efficiency of SR production.

Clark, J., Glasziou, P., Del Mar, C., Bannach-Brown, A., Stehlik, P., & Scott, A.M. A full systematic review was completed in 2 weeks using automation tools: A case study. J Clin Epidemiol, 2020; 121: 81-90.

Manuscript avaliable from the publisher's website here.

Thursday, June 18, 2020

It’s Alive!: Pt. III: From Living Review to Living Recommendations

In recent posts, we’ve discussed how living systematic reviews (LSRs) can help improve the currency of our understanding of the evidence, as well as the efficiency with which the evidence is identified and synthesized through novel crowdsourcing and machine learning techniques. In the fourth and final installment of the 2017 series on LSRs, Akl and colleagues apply the LSR approach to the concept of a living clinical practice guideline.

As the figure below from the paper demonstrates, while simply updating an entire guideline more frequently (Panel B) reduces the number of out-of-date recommendations (symbolized by red stars) at any given time, it comes with a serious trade-off: namely, the high amount of effort and time required to continuously update the entire guideline. Turning certain recommendations into "living" models helps solve this dilemma between currency and efficiency. Click to enlarge.













Rather than a full update of an entire guideline and all of the recommendations therein, a living guideline uses each recommendation as a separate unit of update. Recommendations that are eligible to make the transition from “traditionally updated” to “living” include those that are a current priority for healthcare decision-making, for which the emergence of new evidence may change clinical practice, and for which new evidence is being generated at a quick rate.

The Living Guideline Starter Pack

Each step of a recommendation’s formation must make the transition to “living,” including:
  • A living systematic review
  • Living summary tables, such as Evidence Profiles and Evidence-to-Decision tables
  • Online collaborative table-generating software such as GRADEpro can be used to keep these up-to-date with the emergence of newly relevant evidence
  • A living guideline panel who can remain “on-call” to contribute to updates of recommendations with relatively short notice when warranted
  • A living pool of peer-reviewers who can review and provide feedback on updates with a quick turnaround time
  • A living publication platform, such as an online version that links back to archived versions, as well as “pushes” new versions to practice tools at the point of care.
Additional Resources
Further information and support for the development of LSRs, including updated official guidance, is provided on the Cochrane website.

Akl, E.A., Meerpohl, J. J., Elliott, J., Kahale, L. A., Schünemann, H.J., and the Living Sysematic Review Network. Living systematic reviews: 4. Living guideline recommendations. J Clin Epidemiol, 2017; 91: 47-53.

Manuscript available from the publisher's website here. 

Monday, June 15, 2020

It’s Alive! Pt. II: Combining Human and Machine Effort in Living Systematic Reviews

Systematic review development is known to be a labor-intensive endeavor that require a team of researchers dedicated to the task. The development of a living systematic review (LSR) that is continually updated as newly relevant evidence becomes available presents additional challenges. However, as Thomas and colleagues write in the second installment of the 2017 series on LSRs in the Journal of Clinical Epidemiology, we can make the process quicker, easier, and more efficient by harnessing the power of machine learning and “microtasks.”

Suggestions for improvements in efficiency can be categorized as either automation (incorporation of machine learning/replacement of human effort) or crowdsourcing (distribution of human effort across a broader base of individuals).

A diagram from Thomas et al. (2017) describes the "push" model of evidence identification that can help keep Living Systematic Reviews current without the need for repeated human-led searches. Click to enlarage.

From soup to nuts, opportunities for the incorporation of machine learning into the LSR development process include:


  • Continuous, automatic searches that “push” new potentially relevant studies out to human reviewers
  • Exclusion of ineligible citations through automatic text classification, reducing the number of items that require human screening with over 99% sensitivity
  • Crowdsourcing of study identification and "microtask" screening efforts such as Cochrane Crowd, which at the time of this blog’s writing had resulted in over 4 million screening decisions from over 17,000 contributors 
  • Automated retrieval of full text versions of included documents
  • Machine-based extraction of relevant data, graphs and tables from included documents
  • Machine-assisted risk of bias assessment
  • Template-based reporting of important items
  • Statistical thresholds that flag when a change of conclusions may be warranted
As technology in this field progresses, the traditionally duplicated stages of screening and data extraction may even be taken on by a computer-human pair, combining the ease and efficiency of automation with the “human touch” and high-level discernment that algorithms still lack.

Thomas, J.,  Noel-Storr, A., Marshall, I., Wallace, B., McDonald, S., Mavergames, C... & the Living Systematic Review Network. Living systematic reviews: 2. Combining human and machine effort. J Clin Epidemiol, 2017; 91: 31-37. 

Manuscript available from publisher's website here. 

Wednesday, June 10, 2020

It’s Alive! Pt. I: An Introduction to Living Systematic Reviews

As research output continues to rise, the systematic reviews charged with comprehensively identifying and synthesizing the evidence within them are becoming more quickly out-of-date. In addition, the formation of a systematic review team can be a lengthy process, and institutional memory of the project is lost when teams are disbanded after publication.

One solution to this problem is the concept of a living systematic review, or LSR. In the first installment of a 2017 series in the Journal of Clinical Epidemiology, Elliott and colleagues introduce the concept of an LSR and provide general guidance on their format and production.

What is a Living Systematic Review (LSR)?
An LSR has a few key components:
  • Based on a regularly updated search run with an explicit and pre-established frequency (at least once every six months) to identify any potentially relevant recent publications. 
  • Utilize standard systematic review methodology (different from a rapid review)
  • Most useful for specific topics:
    • that are of high importance to decision-making,
    • for which the certainty of evidence is low or very low (meaning our certainty of the effect may likely change with the incorporation of new evidence), and
    • for which new evidence is being generated often.


A figure from Elliott et al. (2017) provides an overview of the LSR development process, from protocol to regular searching and screening and incorporation and publication of new evidence.


LSRs from End to End
An LSR can either be started from scratch with the intention of regular screening and updating of evidence – in which case the protocol should specify these planned methods – or based upon an existing up-to-date systematic review, in which case the protocol should be amended to reflect these changes.

Due to their nature, the publication of LSRs requires the use of an online platform with linking mechanisms (such as CrossRef) or with explicit versions (such as the Cochrane database) that can be updated as soon as new evidence is incorporated.

When the certainty of evidence reaches a higher level, or if the generation of new evidence substantially slows, an LSR may be discontinued in favor of traditional approaches to updating.

Additional Resources
Further information and support for the development of LSRs, including updated official guidance, is provided on the Cochrane website.

Elliott, J.H., Synnot, A., Turner, T., Simmonds, M., Akl, E.A., McDonald, S... & Thomas, J. Living systematic review: 1. Introduction - the why, what, when, and how. J Clin Epidemiol, 2017; 91:23-30.

Manuscript available from the publisher's website here.


Friday, June 5, 2020

Research Revisited: 2014’s “Guidelines 2.0: Systematic Development of a Comprehensive Checklist for a Successful Guideline Enterprise”

While several checklists for the development and appraisal of specific guidelines had been developed by 2014, there had yet to be published a thorough and systematic resource for organizations to inform the actual day-to-day operations of a guideline development program. Noticing this need, Schünemann and colleagues pooled their professional experiences and contacts in the field in addition to conducting a systematic search for self-styled “guidelines for guidelines” and other guideline development handbooks, manuals, and protocols. The reviewers, in duplicate, extracted the key stages and processes of guideline development from each of these documents, compiling them together.

The result was the G-I-N/McMaster Guideline Development Checklist: an 18-topic, 146-item soup-to-nuts comprehensive manual spanning each part and process of a guideline development program, from budgeting and planning for a program to the development of actual guidelines to their dissemination, implementation, evaluation, and updating.
An overview of the steps and parties involved in the G-I-N/McMaster guideline development checklist. Click to enlarge.

















The checklist also provides hyperlinks to tried-and-true online resources for many of these aspects, such as tips for funding a guideline program, tools for project management, topic selection criteria, and guides for patient and caregiver representatives.

Schünemann HJ, Wiercioch W, Etxeandia I, Falavigna M, Santesso N, Mustafa R, Ventresca M et al. Guidelines 2.0: Systematic development of a comprehensive checklist for a successful guideline enterprise. CMAJ 186(3): E123-E142.

Manuscript available for free here.

Tuesday, June 2, 2020

Research Shorts: Use of GRADE for the Assessment of Evidence about Prognostic Factors

In addition to questions of interventions and diagnostic tests, GRADE can also be used to assess the certainty of evidence when it comes to prognostic factors. In part 28 of the Journal of Clinical Epidemiology’s GRADE series published earlier this year, Foroutan and colleagues provide guidance for applying GRADE to a body of evidence of prognostic factors.

The Purpose of Prognostic Studies

GRADE may be applied to a body of evidence, separated by individual prognostic factors instead of outcomes, for one of two reasons. The first is a non-contextualized setting, such as when the certainty of evidence surrounding prognostic factors is being evaluated for application within research planning and analysis (e.g., determining which factors are best to use when stratifying for randomization). The second is a contextualized setting, when the certainty of evidence surrounding prognostic factors is used to help inform clinical decisions.

Establishing the Certainty of Evidence

Unlike when grading the certainty of evidence of an intervention, when assessing prognostic evidence, the overall certainty for observational studies starts out as HIGH. This is because the patient population is likely to be more representative studies than in RCTs, when eligibility criteria may place artificial restrictions on the characteristics of patients. Certainty may then be rated down based on the five traditional domains:
  • Risk of bias tools and instruments such as QUality In Prognosis Studies (QUIPS) and Prediction model Risk Of Bias ASsessment Tool (PROBAST) may be helpful here. When teasing out the effect of each potential factor, consider utilizing some form of multivariate analysis that accounts for dependence between several different prognostic factors.
  • Inconsistency can be examined via visual tests of the variability between individual point estimates and the overlap of confidence intervals; statistical tests such as i2 are likely to be less helpful, as they can often be inflated when large studies lead to particularly narrow Cis. As always, potential explanations for any observed heterogeneity should be considered a priori.
  • Imprecision will depend on whether the setting is contextualized, in which case it will depend on the relationship between the confidence interval and the previously set clinical decision threshold, or non-contextualized, in which case the threshold will most likely represent the line of no effect.
  • Indirectness should be based on a comparison of the PICOs for the clinical question at hand, and those addressed in the meta-analyzed studies.
  • Publication bias can be assessed via visually exploring a funnel plot or the use of appropriately applied statistical tests.
Foroutan F, Guyatt G, Zuk V, Vandvik PO, Alba AC, Mustafa R, Vernooij R et al. GRADE guidelines 28: Use of GRADE for the assessment of evidence about prognostic factors: Rating certainty in identification of groups of patients with different absolute risks. J Clin Epidemiol 121; 62-70.

Manuscript available from the publisher's website here.