Cancer Clinical Trials Don’t Benefit Patients

Hearkening back to my former life as the chair of an Institutional Review Board: you do not promise or imply a potential for benefit to clinical trial participants.

Why? Because clinical trials aren’t designed to benefit participants. Participants may be randomized to the “standard of care” arm. The trial drug may not have any improvement in efficacy over the “standard of care”. Worse, the trial drug may, in fact, have greater toxicity than the current options. Finally, there are the frequent – and frequently invasive – trial procedures: blood draws, repeat imaging, and repeat tumor sampling.

The perception remains clinical trials produce better outcomes for some trial participants – but the whole of the literature does not support this conclusion. This systematic review and meta-analysis from JAMA clearly shows the data are insufficient to support a net benefit from cancer clinical trial participation. Small signals of benefit are most likely the result of trial effects and publication bias.

The unquestioned benefit? To pharma – and, distantly, potentially to future patients.

While this study does not exclude such benefits for cancer clinical trial participation, it remains unsubstantiated.

https://pubmed.ncbi.nlm.nih.gov/38767595

When EHR Interventions Succeed … and Fail

This is a bit of a fascinating article with a great deal to unpack – and rightly published in a prominent journal.

The brief summary – this is a “pragmatic”, open-label, cluster-randomized trial in which a set of interventions designed to increase guideline-concordant care were rolled out via electronic health record tools. These interventions were further supported by “facilitators”, persons assigned to each practice in the intervention cohort to support uptake of the EHR tools. In this specific study, the underlying disease state was the triad of chronic kidney disease, hypertension, and type II diabetes. Each of these disease states has well-defined pathways for “optimal” therapy and escalation.

The most notable feature of this trial is the simple, negative topline result – rollout of this intervention had no reliably measurable effect on patient-oriented outcomes relating to disease progression or acute clinical deterioration. Delving below the surface provides a number of insights worthy of comment:

  • The authors could have easily made this a positive trial by having the primary outcome as change in guideline-concordant care, as many other trials have done. This is a lovely example of how surrogates for patient-oriented outcomes must always be critically appraised for the strength of their association.
  • The entire concept of this trial is likely passively traumatizing to many clinicians – being bludgeoned by electronic health record reminders and administrative nannying to increase compliance with some sort of “quality” standard. Despite all these investments, alerts, and nagging – patients did no better. As above, since many of these trials simply measure changes in behavior as their endpoints, it likely leaves many clinicians feeling sour seeing results like these where patients are no better off.
  • The care “bundle” and its lack of effect size is notable, although it ought to be noted the patient-oriented outcomes here for these chronic, life-long diseases are quite short-term. The external validity of findings demonstrated in clinical trials frequently falls short when generalized to the “real world”. The scope of the investment here and its lack of patient-oriented improvement is a reminder of the challenges in medicine regarding evidence of sufficient strength to reliably inform practice.

Not an Emergency Medicine article, per se, but certainly describes the sorts of pressures on clinical practice pervasive across specialties.

“Pragmatic Trial of Hospitalization Rate in Chronic Kidney Disease”
https://www.nejm.org/doi/full/10.1056/NEJMoa2311708

Quick Hit: Elders Risk Assessment

A few words regarding an article highlighted in one of my daily e-mails – a report regarding the Elders Risk Assessment tool (ERA) from the Mayo Clinic.

The key to the highlight is the assertion this score can be easily calculated and presented in-context to clinicians during primary care visits, allowing patients with higher scores to be easily identified for preventive interventions. With an AUC of 0.84, the authors are rather chuffed about the overall performance. In fact, they close their discussion with this rosy outlook:

The adoption of a proactive approach in primary care, along with the implementation of a predictive clinical score, could play a pivotal role in preventing critical ill- nesses, benefiting patients and optimizing healthcare resource allocation.

Completely missed by their limitations is that prognostic scores are not prescriptive. The ERA is based on age, recent hospitalizations, and chronic illness. The extent to which the management of any of these issues can be addressed “proactively” in the current primary care environment, and demonstrate a positive impact on patient-oriented outcomes, remains to be demonstrated.

To claim a scoring system is going to better the world, it is necessary to compare decisions made with formal prompting by the score to decisions made without – several steps removed from performing a retrospective evaluation to generate an AUC. It ought also be appreciated some decisions based on high ERA scores will increase resource utilization without a corresponding beneficial effect on health, while lower scores may likewise inappropriately bias clinical judgement.

This article has only passing applicability to emergency medicine, but the same issues regarding the disutility of “prognosis” apply widely.

“Individualized prediction of critical illness in older adults: Validation of an elders risk assessment model”
https://agsjournals.onlinelibrary.wiley.com/doi/abs/10.1111/jgs.18861

Update to Start 2024

A brief post collating a few bits of my various work published across the interwebs ….

The Annals of Emergency Medicine Podcast continues to summarise the meatiest articles from each month, featuring a cycle of new co-hosts, as well:

Naturally, there are continuing Journal Club features, covering the following articles:

I should also point out a couple additional new publications with two very different and amazing teams:

Lastly, in ACEPNow, we have:

Enjoy!

Everyone’s Got ChatGPT Fever!

And, most importantly, if you put the symptoms related to your fever into ChatGPT, it will generate a reasonable differential diagnosis.

“So?”

This brief report in Annals describes a retrospective experiment in which 30 written case summaries lifted from the electronic documentation system were fed to either clinician teams or ChatGPT. The clinician teams (either an internal medicine or emergency medicine resident, plus a supervising specialist) and ChatGPT were asked to generate a “top 5” of differential diagnoses, and then settle upon one “most likely” diagnosis. Each case was tested both solely on the recorded narrative, as well as with laboratory results added.

The long and short of this brief report is the lists of diagnoses generated contained the correct final diagnosis with similar frequency – about 80-90% of the time. The correct leading diagnosis was chosen from these lists about 60% of the time by each. Overlap between clinicians and ChatGPT in their lists of diagnoses was, likewise, about 50-60%.

The common reaction: wow! ChatGPT is every bit as good as a team of clinicians. We ought to use ChatGPT to fill in gaps where clinician resources are scarce, or to generally augment clinicians contemporaneously.

This may indeed be a valid reaction, and, looking at the healthcare funding environment, it is clear billions of dollars are being thrown at the optimistic interpretation of these types of studies. However, what is lacking from these studies are any sort of comparison. Prior to ChatGPT, clinicians did not operate in an information resource vacuum, as is frequently the case in these contrived situations. When faced with clinical ambiguity, clinicians (and patients) have used general search engines, in addition to medical knowledge-specific resources (e.g., UpToDate) as augments. These ChatGPT studies are generally, much like many decision-support studies, quite light on testing their clinical utility and implementation in real-world contexts.

Medical applications of large language models are certainly interesting, but it is always valuable to remember LLMs are not “intelligent” – they are simply pattern-matching and generation tools. They may, or may not, provide reliable improvement over current information search strategies available to clinicians.

ChatGPT and Generating a Differential Diagnosis Early in an Emergency Department Presentation

Don’t Use Lytics in Mild Stroke, Part 3

Well, PRISMS demonstrated unfavorable results.

MARISS tried to ascertain predictors of poor outcome in mild stroke, and intravenous thrombolysis was not associated with an effect on the primary outcome.

Now, again, we examine thrombolysis in “mild” stroke, in this case, NIHSS ≤3 – and fail.

Like MARISS, this is a retrospective dredge of patients selected by the treating clinicians to receive either intravenous thrombolysis or, in this case, dual-antiplatelet therapy with clopidogrel and aspirin. The population included for analysis is the Austrian Stroke Unit Registry from 2018 until 2019, an original cohort of 53,899 patients. Of these, 29,252 were NIHSS ≤3, but exclusions meant nearly 25,000 were left out – primarily those whose strokes were the result of atrial fibrillation, or whose treating clinicians chose platelet monotherapy instead of dual antiplatelet therapy.

The remaining ~4,000 were analyzed both in their unadjusted cohorts, as well as propensity scored cohorts comprised of roughly 20% of the original. In the unadjusted cohorts, efficacy and safety outcomes were universally worse in those selected for thrombolysis – but, of course, were generally more severe stroke syndromes. After propensity score matching, these differences generally disappeared – except a preponderance of sICH in the thrombolysis cohort.

The authors here conclude there’s no evidence of superiority for thrombolysis in mild stroke, and their results fit broadly with those from other cohorts. It’s observational and unreliable, but it ought to be a very reasonable stance to withhold thrombolysis for mild strokes pending trials conclusively demonstrating which, if any, mild strokes do improve with thrombolysis.

IV Thrombolysis vs Early Dual Antiplatelet Therapy in Patients With Mild Noncardioembolic Ischemic Stroke

Which Sepsis Alert is the Biggest Loser?

It’s a trick question – in the end, all of us have already lost.

This is a short retrospective report evaluating, primarily, the Epic Sepsis Prediction Model, and the mode in which is deployed. The Epic SPM generates a “prediction of sepsis score”, calculated at 15 minute intervals, providing a continuous risk score for the development of sepsis. Of course, in modern medicine, this is usually reduced to a trigger threshold at which point an alert is fired. Alerts, alerts, alerts – what are they good for?

In this study, the Epic SPM was evaluated at several difference SPS score thresholds ranging from ≥5 to ≥10 – and compared, as well, with SIRS, qSOFA, and SOFA. There were two goals for the evaluation: accuracy and timeliness. All prediction tools provided the same age-old tradeoff between sensitivity and specificity, with a PSS of ≥5 being 95% sensitive, but merely 53% specific. Likewise, a more specific cut-off sacrificed sensitivity. SIRS, qSOFA, and SOFA suffered from the same limitations.

The “time to detection” was a bit more interesting, but conclusions are a bit limited by the methods used to determine. The PSS is calculated at 15 minute intervals, while their calculations of SIRS, qSOFA, and SOFA all happened at hourly intervals. Then, “time zero” for their calculations was actually determined by the time of clinician action – the time at which a clinician suspected sepsis and ordered either antimicrobials or blood cultures. With respect to timeliness, only a minority of patients met threshold scores at “time zero” – except SIRS, where nearly half were at threshold.

So, it’s hard to conclude much from these data – other than, as previously alluded, we are all losers. These alerts are clearly useless, yet they, and the Surviving Sepsis bundle gestapo have trained clinicians to leap at the earliest opportunity to (over)diagnose sepsis and administer broad-spectrum antibiotics. Multiple specialty societies have asked for the SEP-1 measures to be rolled back due to these obvious harms, let alone the administrative costs, and eliminating that “quality” measure would go a long way to putting these useless alerts to bed.

Sepsis Prediction Model for “Determining Sepsis vs SIRS, qSOFA, and SOFA”

End Nail Dogma

In a world of doors, truck beds, furniture, and other finger-crushing nuisances, emergency department visits for injuries involving the distal digits are common. Injuries range from tuft fractures, to degloving injuries, to all manner of nail and nailbed derangement.

Perusing any textbook or online resource will typically advise some manner of repair, including, but not limited to, replacing an avulsed nail back into the proximal nail fold and securing it in place. If the avulsed nail is not available, recommendations include placing a bit of foil into the proximal nail fold. The general idea being that failure to do so will irretrievably scar the germinal matrix, resulting in some disfigured and mutant nail growth.

The NINJA trial tests whether this dogma is valid – and, rather unsurprisingly, finds it is not.

In this trial, children with finger nail and nailbed injuries requiring surgical repair were randomized, at the conclusion of the injury repair, either to replacement of the nail (or foil) into the nail fold, or to discard the nail and simply leave on a non-adherent dressing. The “c0-primary” outcomes were cosmetic appearance of the nail (using the Oxford Fingernail Appearance Score) and surgical-site infection at 1 week follow-up.

The majority of the 451 children involved were aged younger than 6 and most were crush injuries resulting in avulsion of the nail plate. The primary outcomes were no different between groups – 5 and 2 surgical-site infections in the “nail replacement” and “nail discarded” groups, respectively, and median OFNAS score was 5 (the highest score) in each group. Lest the trial be accused of just failing to demonstrate a difference favoring the “nail replacement” group, it was actually the “nail discarded” group having a non-significantly more favorable distribution of cosmetic scores.

When suggesting these results are unsurprising, it’s rather just a perspective many clinical encounters in the emergency department are “over-medicalized”, and receive unnecessary tests or treatment simply due to the spectrum bias associated with acute care. Most healthy human substrate is capable of healing from minor injury in a satisfactory fashion; hopefully, these results further inform the care of children with finger nail injuries, and, may be reasonably generalized to other nails and healthy adults.

Effectiveness of nail bed repair in children with or without replacing the fingernail: NINJA multicentre randomized clinical trial

The Opiates in Back Pain Conundrum

We do love to give out opiates in the emergency department. Kidney stone? Opiates. Broken arm? Opiates. Gunshot wound? Opiates. Sore throat? Dexamethasone. And opiates.

So of course we’re here with opiates for your back pain.

In this modern day, we are far, far more judicious than in times of yore, back when pharma had lobbied for pain to become the “fifth vital sign”. But, nonetheless, those patients who are struggling to manage despite non-opiate analgesia frequently end up with some sort of small supply to try and resolve an acutely painful condition.

The OPAL trial, published in The Lancet, is yet another in a series of trials decrying the disutility of virtually anything for back pain – in the context of prior work diminishing the efficacy of skeletal muscle relaxants, as well as even acetaminophen added to ibuprofen. In this trial, patients with “acute” low back pain were prescribed an oxycodone-based opiate or matching placebo, and their functional recovery was assessed in follow up. Unfortunately, no advantage was seen for patients randomized to oxycodone, while there were small, but likely real, risks for opiate misuse at later intervals.

However, does this trial apply to the emergency department?

  • Patients were eligible if they had low back pain for up to 3 months. This is not exactly “acute” – especially since early versions of the protocol excluded patients whose back pain had been ongoing for less than 2 weeks.
  • Modified-release oxycodone-naloxone was the opiate of choice in this Australian trial. The naloxone itself does not exert much influence on the analgesic effect, but the preparation itself differs from preparation used commonly in the emergency department.
  • The follow-up interval was at six weeks, a good patient-oriented timeframe for long-term clinical resolution. However, emergency department treatment tends to choose opiate analgesia with the goal of short-term mobilization and return to activity, so 48- or 72- hour relief or functioning may be more relevant.

The most notable problem with this trial is not, in fact, the trial itself. Rather, the issue remains the paucity of true short-term data regarding any added benefit for the minimally effective quantity of opiates usually dispensed from the emergency department. Spring into action, team!

“Opioid analgesia for acute low back pain and neck pain (the OPAL trial): a randomised placebo-controlled trial”

The Cost of “Quality”

In case you missed this beautiful little article, it’s worth re-highlighting regarding the paradoxical “cost” of “quality”.

In theory, high-quality care is its own reward. Timely actions and interventions, thoughtful and thorough evaluations, and appropriate guideline adherence when applicable are all goals with reasonable face validity for healthcare delivery. Competing incentives, however, coupled with time pressures, erode some of the natural inclination towards ideal care. Thus, “quality” metrics and goals, created with the best of intentions to nudge clinicians and health systems towards better care.

Unfortunately, the siren song of “quality” has begat a locust horde of metrics from all manner of organizations. Health care expenditures in the U.S. have grown from 9% of GDP to 20% GDP, and administrative costs are estimated to comprise up to 30% of total national health care spending. To add context to these larger estimates, this little article simply looks within their own institution to evaluate the potential contribution of “quality” measures to those larger sums.

The authors identified, by surveying personnel across their institution, 162 quality metrics reported to 7 measuring organizations, totalling 271 reports (as some required reporting to multiple organizations). The bulk (70%) were publicly reported “quality” measures, while another 27% were related to pay-for-performance programs.

Overall, across surveyed personnel, the authors determined approximately 108,000 person-hours were consumed annually on these reports. Based on the annual salaries of the individuals involved and their time commitment, the total annual cost to the institution was estimated at over USD$5 million. The most expensive metrics were those requiring individual chart abstraction, while those metrics requiring merely electronic data capture required a fraction of the cost.

Multiplied by the 4000+ hospitals in the U.S., suddenly we’re obviously talking about tens of billions of dollars of added administrative overhead. Interestingly enough, and relevant to emergency medicine, one of the worst offenders as far as cost is SEP-1 – the CMS sepsis core measure. Not only is this measure onerous and costly to administer on the institutional side, it results in substantial unmeasured additional work for clinical staff – and I suspect many of these “quality” measures have their cost similarly underestimated.

Administrative costs aside, it is as important to consider whether “quality” metrics actually reflect higher-quality care, or whether the changes in care driven by metrics improve value. What is certain, however, is their proliferation has been clearly nightmarish.

“The Volume and Cost of Quality Metric Reporting”
https://jamanetwork.com/journals/jama/article-abstract/2805705