U.S. GRADE Network blog: Diagnostic test accuracy

Showing posts with label Diagnostic test accuracy. Show all posts

Wednesday, August 3, 2022

Spring 2022 Scholars Discuss Developments in Diagnostic and Environmental Health Evidence

The USGN's 16th GRADE Guideline Development Workshop, held in Chicago, was the first to be held in-person since March of 2020. In classic USGN style, participants enjoyed vibrant conversation, hours of learning, and delicious yogurt parfaits and strong coffee during morning breaks.

Two participants joined the fun and learning as part of the Evidence Foundation scholarship program, presenting to fellow attendees about their current projects related to evidence synthesis and guideline development.

Spring 2022 Evidence Foundation scholars Kapeena Sivakumaran and Ibrahim El Mikati, center, pose for a photo between sessions in Chicago with the U.S. GRADE Network faculty (from left to right: Reem Mustafa, Philipp Dahm, Shahnaz Sultan, Yngve Falck-Ytter, Rebecca Morgan, and Hassan Murad).

Ibrahim El Mikati, a post-doctoral research fellow in the Outcomes and Implementation Research Unit at the University of Kansas Medical center, discussed his project helping to develop guidance for judging imprecision in diagnostic evidence. This approach will utilize thresholds for confidence intervals and will also introduce the concept of optimal information sizes for assessing imprecision in the context of diagnostic guidelines.

One thing that the GRADE workshop has helped me appreciate is transparency," said Ibrahim. "Having a transparent explanation of judgments provides users with trustworthy guidelines."

Kapeena Sivakumaran is currently leading two systematic reviews for Health Canada related to the impact of noise exposure and sleep disturbance on health outcomes. Challenges of these projects include a focus on short-term outcomes in the relevant literature as well as the need to incorporate multiple evidence streams, such as mechanistic data that can be interpreted in conjunction with observational evidence.

“The workshop provided me with valuable insight into guideline development and using the GRADE approach to assess the evidence," said Kapeena. "One new thing I learned from the workshop was how automation and [artificial intelligence] can be integrated into the process of living systematic reviews to support guideline development.”

Note: applications for scholarships to attend our upcoming systematic review and guideline development workshops, held virtually, close August 12th and September 30th, 2022, respectively. See application details here.

Thursday, November 11, 2021

Fall Scholars Dazzle at the 15th GRADE Guideline Development Workshop

During the 15th GRADE Guideline Development Workshop held virtually last month, the Evidence Foundation had the pleasure of welcoming three new scholars with the opportunity to attend the workshop free of charge. As part of the scholarship, each recipient presented to the workshop attendees about their current or proposed project related to evidence-based medicine and reducing bias in healthcare.

Razan Mansour, MD, a postdoctoral research fellow at the University of Kansas Medical Center, spoke about the challenges of navigating variabilities among published systematic reviews when developing clinical recommendations. Variability may emerge from differences between reviews regarding study inclusion criteria, risk of bias assessment, and the way data are presented. Most sources of variability, Dr. Mansour said, are difficult to explain, but a modified version of the A MeaSurement Tool to Assess systematic Reviews (AMSTAR) 2 tool can help identify common "red flags" and help to prioritize and identify the best systematic reviews off of which to base resulting clinical recommendations. Information from multiple higher-quality systematic reviews may be used, and data from individual studies may need to be extracted to paint a complete picture of the evidence.

Says Dr. Mansour, "Learning about how to rate the certainty of evidence about diagnosis was particularly helpful to me, as this is usually challenging. The subgroup discussions were perfect for one-to-one learning and applying your skills as you go through GDT."

Next, Reena Ragala, MOTR/L, spoke about her project as a Senior Evidence-Based Practice Analyst at the Medical University of South Carolina Value Institute. The Institute was established in 2012 to support MUSC's efforts to build the infrastructure to advance science and discovery through evidence-based practice. Starting in the spring/summer of 2022, Ragala will help lead guideline development "boot camps" though the Institute to train providers within rural health networks in the application of evidence-based guideline methodology. Through this process, Ragala hopes to empower rather than intimidate front-line staff and clinicians new to clinical guideline development.

The fall 2021 Evidence Foundation scholars pose for a virtual group photo with U.S. GRADE Network faculty between workshop sessions. From top-left: Philipp Dahm, Reem Mustafa, Osama Altayar, Yngve Falck-Ytter, Rebecca Morgan, Carolina Soledad, Reena Ragala, Perica Davitkov, Madelin Siedler, Razaan Mansour, and Shahnaz Sultan.

"When using GRADE for diagnostic tests or guideline development, judgements or recommendations are based on patient outcomes, said Ragala. "GRADE doesn’t just look at the strength/quality of the evidence, but incorporates the feasibility, accuracy, bias, and benefit/harm to the patient to ultimately make recommendations."

Carolina Soledad, MD, MStat, PhD, spoke about the unique opportunities, challenges, and solutions within an aspect of guideline development that has gained great relevance in recent years: reaching consensus within the context of virtual meetings. Dr. Soledad, a junior methodologist within the European Society of Anesthesia and Intensive Care (ESAIC) Guidelines Committee, discussed the fact that while in-person meetings have several strengths - such as access to nonverbal cues such as body language and facial expression and shared context - virtual meetings also have unique benefits, such as cost savings and a reduced need to focus on logistics. In addition, they have become a necessity in the age of a global pandemic. To help address the unique questions surrounding virtual meetings - such as the optimal length and number of participants - Dr. Soledad developed a 33-item survey of anesthesiologists and intensivists involved in guideline development. The findings will help formulate best practices for improving the level of communication, engagement, and effectiveness of future virtual meetings.

According to Dr. Soledad, "This workshop gave me a peek into GRADEpro GDT software, and it turned out to be easier to use than I've thought!"

If you are interested in learning more about GRADE and attending the workshop as a scholarship recipient, applications for our upcoming workshop in Chicago, Illinois, are now open. The deadline to apply is March 31, 2022. Details can be found here.

Wednesday, November 25, 2020

Diagnostic Test Accuracy Meta-Analyses Are Often Missing Information Required for Reproducibility

Reproducibility of results is considered a key tenet of the scientific process. When results of a study are reproduced by others using the same protocol, there is less chance that the original results observed were due human or random error. Testing the reproducibility of evidence syntheses (e.g., meta-analyses) is just as important as for individual trials.

In a paper published earlier this month, Stegeman and Leeflang undertook the task of testing the reproducibility of meta-analyses of diagnostic test accuracy. The authors identified 51 eligible meta-analyses published in January 2018. In 19 of these, sufficient information was provided in the text of the study to reproduce the 2x2 tables of the individual studies included; in the remaining 32, only estimates were provided in the text. In 17 of these 32, the authors located primary data to attempt reproducibility. When attempting to reproduce the meta-analyses of the 51 identified papers, reproducibility was only achieved 28% of the time; none of the 17 papers for which 2x2 tables were not provided were reproducible.

Click to enlarge.

Only 14 (27%) of the 51 articles provided full search terms. In nearly half (25) of the included reviews, at least one of the full texts of included references could not be located; in 12, at least one title or abstract could not be located. Overall, of the 51 included reviews, only one was deemed fully reproducible by providing a full protocol, 2x2 tables, and the same summary estimates as the authors.

The authors conclude with a call for increased prospective registration of protocols and improved reporting of search terms and methods. The application of the 2017 PRISMA statement for diagnostic test accuracy is a helpful tool for any aspiring author of a diagnostic test accuracy meta-analysis to improve the reporting and reproducibility of results.

Stegeman I. and Leeflang M.M.G. (2020). Meta-analyses of diagnostic test accuracy could not be reproduced. J Clin Epidemiol 127:161-166.

Manuscript available at the publisher's website here.

Friday, May 22, 2020

Research Shorts: Assessing the Certainty of Diagnostic Evidence, Pt. II: Inconsistency, Imprecision, and Publication Bias

Earlier this week, we discussed a recent publication of the GRADE series in Journal of Clinical Epidemiology that provides guidance for assessing risk of bias and indirectness across a body of evidence of diagnostic test accuracy. In this post, we’ll follow-up with continued guidance (published in Part II) for the rest of the GRADE domains.

Inconsistency

Unexplained inconsistency should be evaluated separately for the findings on test specificity and test sensitivity. When a meta-analysis is available, both the visual and quantitative markers of inconsistency can be used in a similar fashion to a meta-analysis of intervention studies. If differences between studies related to any of the PICO elements is suspected as an explanation for observed heterogeneity, exploration via subgroup analyses may be appropriate.

Imprecision

Again, imprecision of a test’s sensitivity and specificity should be evaluated separately. As with assessments of interventional evidence, evaluation of imprecision across a body of test accuracy studies entails the consideration of the width of the confidence interval as well as the number of events (specifically, the number of patients with the disease and the number of positive tests for sensitivity, and the number of patients without the disease and the number of negative tests for specificity).

In contextualized settings, when one end of the confidence interval may lead to the use of the testing strategy while the other end would not, then imprecision is likely present. It may be helpful to set priori a threshold through which a confidence interval should not cross in order for the test to have sufficient value.

Publication Bias

The use of traditional funnel plot assessments (e.g., Egger’s or Begg’s test) on a body of test accuracy studies is more likely to result in undue suspicion of publication bias than when applied to a body of therapeutic studies. While other sophisticated statistical assessments are available (e.g., Deeks’ test, trim and fill), systematic review and health technology assessment (HTA) authors may choose to base a judgment of publication bias on the knowledge of the existence of unpublished studies. If studies published by for-profit entities or those with precise estimates claiming high test accuracy despite small sample sizes exist, publication bias may also be suspected.

Upgrading the Certainty of Evidence ("Rating Up")

As with an assessment of interventional evidence, there may be reasons to upgrade the certainty of evidence in the face of highly convincing links between the use of a test and the likelihood and/or magnitude of an observed outcome. The diagnostic test accuracy equivalent of a dose-response gradient – the Receiving Operator Characteristic, or ROC curve – may be used to assess this potential upgrader.

Schünemann H, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, Bossuyt P, et al. GRADE guidelines 21 pt. 2. Test accuracy: inconsistency, imprecision, publication bias, and other domains for rating the certainty of evidence and presenting it in evidence profiles and summary of findings tables. J Clin Epidemiol 2020 Feb 10. pii: S0895-4356(19)30674-2. doi: 10.1016/j.jclinepi.2019.12.021. [Epub ahead of print].

Manuscript available here on the publisher's site.

Tuesday, May 19, 2020

Research Shorts: Assessing the Certainty of Diagnostic Evidence, Pt. I: Risk of Bias and Indirectness

Systematic reviews or health technology assessments (HTAs) that examine the body of evidence on diagnostic procedures can - and should - transparently assess and report the overall certainty of evidence as part of their findings. In the two-part, 21^st installment of the GRADE guidance series published in the Journal of Clinical Epidemiology, Schünemann and colleagues provide methods for approaching the first two major domains of the GRADE approach: risk of bias and indirectness.

While there are certainly differences between methods for assessing the certainty of evidence of diagnostic tests as opposed to interventions, the fundamental parts of GRADE remain unchanged:

Make Clinical Questions Clear via PICOs

It is paramount to clearly define the purpose or role of a diagnostic test and to see the test in light of its potential downstream consequences for making subsequent treatment decisions. As with a review of an intervention, a review of a diagnostic test should be built upon questions that define the Population, Intervention (the “index test” being assessed), Comparator (the “reference” test representing the current standard of care), and Outcomes (PICOs).

Prioritize Patient-Important Outcomes

Outcomes should be relevant to the population at hand. As such, the ideal study design to generate this evidence for outcomes related to test accuracy is a randomized controlled trial with a test-retest format that directly investigates the downstream effects of a testing strategy on outcomes in the population at hand, seen in Figure 1A below.

However, this is often not available. In this case, test accuracy would be used as a surrogate outcome, and test accuracy studies such as those in Figure 1B can be linked to additional evidence that examines the effect of downstream consequences of test results on patient-important outcomes. (More on that in a March 2020 blog post, here.)

Assessing Risk of Bias in Test Accuracy Studies

There are several important factors to consider when assessing a body of test accuracy studies for risk of bias. Potential issues with regard to risk of bias include:

· Populations that differ from those intended to receive the test (e.g., in terms of disease risk)

· Failure to compare the test in question to an independent reference/standard test in all enrolled patients (e.g., by using only a composite test)

· Lack of blinding when ascertaining test results

The QUADAS-2 tool can be used to guide assessment of bias in these studies.

Use PICO to Guide Assessment of Indirectness

Lastly, as when evaluating intervention studies, indirectness can be assessed by determining whether the Population, Index test, Comparator/reference test, and Outcomes match those in the clinical question.

Schünemann H, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, Bossuyt P, et al. GRADE guidelines 21 pt. 1: Study design, risk of bias, and indirectness in rating the certainty across a body of evidence for test accuracy. J Clin Epidemiol Feb 12. pii: S0895-4356(19)30673-0. doi: 10.1016/j.jclinepi.2019.12.020. [Epub ahead of print]

Manuscript available here on publisher’s site.

Wednesday, May 6, 2020

Research Short: Defining Ranges for Certainty of Evidence Ratings of Diagnostic Accuracy

Recently, we reviewed a paper describing the methods by which the evidence of downstream consequences of screening can be linked to evidence of test accuracy via formal and informal modeling. The resulting judgment of the certainty of this evidence will communicate our certainty that a test’s true accuracy lies within a given range. A new paper published earlier this year provides guidance on evaluating the certainty of evidence for diagnostic accuracy.

Ranges for determining the certainty of evidence of test accuracy may be either fully or partially contextualized (meaning the range takes into account some or all of the possible effects of a test strategy, and is based on a value judgment of the relative importance of outcomes) or non-contextualized (meaning the range only takes into account the accuracy of the test without consideration of the relative implications of false positive or negatives).

Non-contextualized judgments assume that outside of differences in accuracy, everything else about two test strategies will have the same impact on outcomes; thus, certainty of evidence is judged based solely on the accuracy data. Contextualized judgments, on the other hand, also take into account the downstream consequences of a test’s accuracy – particularly the potential effects of false positives or negatives. Typically, non-contextualized or partially contextualized ratings are used in systematic reviews or health technology assessments (HTAs), whereas fully contextualized ratings should be used in the formation of guideline recommendations.

Sources of ranges for test accuracy with varying levels of contextualization include:

· Non-contextualized (systematic review or HTA)

o Confidence interval: certainty that the true sensitivity or specificity lies within the confidence interval(s) of the tests

- Does not take precision into account

o Direction of effect: certainty that there is a true difference between the sensitivity and specificity of two test strategies

- Requires a determination of what would make a meaningful difference in accuracy

· Partially contextualized (systematic review or HTA)

o Specified magnitude: determines whether a difference in accuracy between tests is trivial, small, moderate, or large.

- The acceptable magnitude of difference will be based at least partially on the importance of the downstream consequences of false positives and negatives

Example of a partly contextualized diagram of downstream consequences of screening of cervical dysplasia using a screen-treat strategy.

· Fully contextualized (guideline recommendations)

o Rates the certainty of a test’s sensitivity and specificity based on whether the overall balance between benefits and harms would differ from one end of the range to the other.

- Ranges are determined by first considering all important and critical downstream consequences of testing.

Hultcrantz M, Mustafa RA, Leeflang MMG, Lavergne V, Estrada-Orozco K, Ansari MT, Izcovich A et al. Defining ranges for certainty ratings of diagnostic accuracy: A GRADE concept paper. J Clin Epidemiol 117 (138-148).

Manuscript available here on publisher's site.

Monday, February 3, 2020

Research Shorts: From test accuracy to patient-important outcomes and recommendations

Contributed by Madelin Siedler, 2019/2020 U.S. GRADE Network Research Fellow

The potential risks and benefits of a screening or diagnostic testing strategy extend beyond the immediate impact and accuracy of the test itself. The result of testing will determine the available next steps and options for follow-up and management, and therefore will affect various patient-important outcomes in addition to potential resource utilization and equity considerations. These downstream consequences, and the certainty of evidence in these consequences, need to be considered when formulating recommendations surrounding testing. In a July 2019 paper published as part 22 of the Journal of Clinical Epidemiology’s GRADE guidelines series, Schünemann and colleagues provide suggestions for assessing certainty of evidence and determining recommendations for diagnostic tests and strategies.

While a collection of randomized controlled trial evidence examining the downstream consequences of various testing strategies is ideal in this scenario, such data are sparse. In lieu of this, guideline authors should develop a framework that includes each possible testing and follow-up treatment scenario, starting with the test in question and ending with patient-important outcomes.

H.J. Schunemann et al. / Journal of Clinical Epidemiology 111 (2019) 69e82

As seen in this USPSTF sample framework, evidence begins with accuracy studies and ends with patient-important end-points.

This will allow the panel to visually link all relevant existing data together and develop clinical questions that are answerable with the evidence at hand. Data on the accuracy of a given test will help inform the expected number of false negatives and positives, which would then lead to potentially important downstream consequences - such as anxiety or a missed diagnosis - in addition to the effects of treating a diagnosed condition. The estimates of these beneficial and harmful potential outcomes should ideally come from a systematic review of evidence which can then be assessed for certainty.

H.J. Schunemann et al. / Journal of Clinical Epidemiology 111 (2019) 69e82

The authors suggest providing one overall rating of the quality of evidence that takes into account the certainty of the diagnostic, prognostic, and management data that are available. Guideline panels should determine which outcomes of these bodies of evidence are critical and ascribe an overall rating based on the lowest level of certainty of the critical outcomes.

Schünemann HJ, Mustafa RA, Brozek J, Santesso N, Bossuyt PM, Steingart KR, Leeflang M, Lange S, Trenti T, Langendam M, Scholten R. GRADE guidelines: 22. The GRADE approach for tests and strategies—from test accuracy to patient-important outcomes and recommendations. Journal of clinical epidemiology. 2019 Jul 1;111:69-82.

Manuscript available here on publisher's site.

Tuesday, May 14, 2019

Spring 2019 - Scholarship Recipients

Contributed by Madelin Siedler, 2018/2019 U.S. GRADE Network Research Fellow

Recently, we held the Tenth GRADE Guideline Development Workshop in Denver, Colorado. This workshop was one of the largest groups to date, with 51 participants traveling to the Mile-High City from as far away as Poland and Korea. During the workshop, participants focused on learning and applying the GRADE approach for diagnostic test accuracy.

Two participants attended as recipients of the scholarship program funded by the U.S. GRADE Network and Evidence Foundation. This scholarship covers the cost of registration for workshop attendees who are newer to GRADE and have never attended a formal GRADE workshop. Scholars Janice Tufte and Dr. Irbaz bin Riaz presented on their innovative ideas for improving the development, implementation, or dissemination of guidelines with the aim of reducing bias in healthcare recommendations.

Scholarship recipients: Dr. Irbaz bin Riaz (L) and Ms. Janice Tufte (R),

with scholarship coordinator, Dr. Shahnaz Sultan

Tufte, an independent consultant who leads patient-public partnership initiatives, presented on the unique opportunities of using patient partners during the development of GRADE guidelines. Patient partners are representatives of the patient population whom the guideline aims to serve. As part of a guideline panel, they offer fresh perspectives, ground the guideline development process with lived experience, and help the panel to identify and address differences in priorities among stakeholders.

Throughout her presentation, Tufte provided ways to improve how patient partners are involved in the guideline process, such as creating one-pagers and glossaries that cover the basics of GRADE methodology and inquiring beforehand about specific accommodations that might be needed in order to enhance the patient’s participation in the panel. “It was an honor to attend the GRADE Workshop in Denver as a Patient Partner Scholar,” said Tufte. “I felt like I was treated like a colleague where we were all learning together how to use GRADE tools to share best evidence within our individual systems and guidelines work.”

Dr. bin Riaz, an oncologist at Mayo Clinic, presented on a framework for developing living systematic reviews and guidelines to inform clinical decision-making, especially in topic areas undergoing rapid change. It can take several years for a systematic review and resulting clinical recommendations to be developed, Dr. bin Riaz explained. In the meantime, new drug approvals or indications, changes in drug labeling, or new information about potential risks and benefits of a treatment option can arise. As opposed to traditional, static documents, living systematic reviews and guidelines are continually updated as new evidence or important decision-making information comes to light. The ultimate goal of such an approach is to facilitate a more timely translation of medical knowledge into clinical practice, allowing patients and their providers to come to decisions informed by the totality of current evidence.

If interested in applying for a scholarship to future GRADE workshops, more details can be found here: https://evidencefoundation.org/scholarships.html. Please note the deadline for applications to our next workshop in Orlando, Florida will be July 1, 2019.

Tuesday, November 14, 2017

It's been a DTA kind of year

We started 2016 in Minneapolis by hosting our regular Guideline Development Workshop focused on making recommendations about diagnostic tests and test strategies. In 2016, the GRADE Working Group released guidance on decision making about diagnostic tests. That paper can be found here. Our Spring workshop featured three days of guideline development materials tailored for tests and test strategies, including a pre-course systematic review workshop on how to meta-analyzing this material. Dr. Holger Schünemann provided guest lectures and support during the small group exercises.

Earlier this week, in collaboration with the Dutch GRADE Network and Barcelona GRADE Center, Dr. Reem Mustafa kicked off a similar workshop on diagnostic test accuracy (DTA).

Many resources have been released detailing the advancements in DTA decision making. A few are listed below: