(We’ve Moved)

Keep up with interesting things I read on the other site:

… and more! Frequently!

On the Other Site …

I’ve taken the opportunity of reboot to try and make smaller, more digestible chunks to highlight what I’m reading – and to post more often.

So, check out:

It’s a Substack, but I’m not trying to milk anyone for money – don’t worry about that!

New Year, New Me?

It’s been a long haul for “Emergency Medicine Literature of Note” – and, as AI becomes a greater portion of my professional life, correspondingly, the time shrinks for a long-form Emergency Medicine post.

So, in an experiment, I’ve created – and migrated to a more-modern platform – the “Medicine Minute” Substack.

The hope is, with short format thoughts, there can be more-frequent updates – and the longer “rants” can still have either a life here, or in ACEPNow.

Speaking of which, don’t forget:

Annals of Emergency Medicine Podcast

Annals of Emergency Medicine Journal Club

Modeling the Mottled Child: Evaluating a Pediatric Septic Shock Predictive Modeling Screening Tool: February 2025 Annals of Emergency Medicine Journal Club

Let a Million Monkeys With Typewriters Do Your Quality Measure Reporting: January 2025 Annals of Emergency Medicine Journal Club

Is Broader Better? Piperacillin/Tazobactam, Cefepime, and the Risk of Harm: December 2024 Annals of Emergency Medicine Journal Club

ACEPNow

Are Antibiotics for Appendicitis Dead?

The last decade or so featured a rather notable increase in palatability for the conservative management of appendicitis. Why undergo surgery for a condition antibiotics can cure? You wouldn’t take out your bladder for a urinary tract infection, would you?

This latest randomized trial adds to the evidence surrounding the “antibiotics first” strategy for appendicitis by expanding it to children. The failure rates at one year for the “antibiotics first” strategy in adults have been established at roughly 30%, as confirmed in another recent individual patient meta-analysis. At long-term follow-up, the failure rate approaches 50%.

In this trial of nearly 1,000 children across Canada, the USA, Finland, Sweden, and Singapore, virtually the same failure rate was seen, at 34%. Approximately half of the failures occurred at the index hospitalization, whereas the remainder occurred over the one year of follow-up. Conversely, the “negative appendectomy rate”, the measure of failure for those in the surgery arm, was 7%. Adverse events were low and similar across each group.

It is fairly clear the “antibiotics first” strategy, when it works, is superior. These children spent less time in the hospital, were back to normal activity sooner, and required less analgesia. I would suspect, overall, it is also less expensive – whether those costs are born by individual families, or by the health systems in total. However, the observed failure rate – and extrapolating to higher, longer term failures, as with adults – remains a vexing issue. The authors probably summed it up most accurately themselves

“… we suspect that this difference will continue to be interpreted from opposite viewpoints. Those most interested in avoiding an operation will see these data as providing hope, whereas those most interested in avoiding initial treatment failure or recurrence will see the failure rate as unacceptably high.”

Importantly, though, even if these data refuse to give us a solid answer, these do finally give us robust data in children to assist in those shared decision-making conversations.

“Appendicectomy versus antibiotics for acute uncomplicated appendicitis in children: an open-label, international, multicentre, randomised, non-inferiority trial”
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(24)02420-6/fulltext

Variation Exists! Outcomes Exist!

This little article has made the rounds, primarily by those who critique it for its many flaws. However, the underlying themes can still be valid, even if an article has limitations.

This is a “there is variation in emergency physician admitting practices” article. Literally every practicing physician working in a hospital environment knows there is a broad spectrum of skill, approach to acute illness, and level of risk-tolerance. These attributes manifest in different ways, and, in emergency physicians, one is the differing likelihood two clinicians might have to admit same patient to the hospital.

In this fashion, this descriptive study is basically fine. Over time, with a few exceptions, clinicians all basically see similar distributions of patients. Thus, it is very reasonable for this study to estimate there is a 90th percentile admission rate for “chest pain” of around 56%, and a 10th percentile admission rate around 32%. The underlying principle has face validity, even if the precise numbers do not.

The second part of the analysis involves the downstream outcomes after these patients are seen and/or admitted following their emergency department visit. The first point involves whether the subsequent inpatient stay was less than 24 hours, and the second point involves downstream short- and long-term mortality. The authors also tried to evaluate the frequency and outcomes of laboratory and radiology tests ordered by emergency physicians.

Without getting too granular into the data presented, the gross pattern is that clinicians with higher admission rates were also associated with higher likelihoods of <24 hour inpatient stays. This association was most prominent, unsurprisingly, in the cohort of patients with “chest pain”. Patterns were slightly less prominent, but still present, between higher rates of radiology and laboratory testing and subsequent admission.

The kicker from this study, and the mildly controversial portion, is where these authors tie this all back to the mortality data: no association between admission rate and mortality. The general implication vilifying those clinicians with higher rates of admission, as these behaviors are generating only short (read: unnecessary) admissions of no value (no mortality difference).

Everything here is almost assuredly imprecise and unable to be generalized outside the VA system involved. There are going to be issues with confounding, mis-coded data, and variation across sites. That said, the underlying principle here is probably true – some clinicians over-test, over-consult, and over-admit to no patient-oriented benefit.

However, what is to be done? Changing clinician behavior is fraught, and it is unclear whether reduced admission rates from the highest-admitting cohort would safely target only those whose admissions were unnecessary. Worse still, attempting to change behaviors in the U.S. involves more than patient-level considerations, but issues of health system and tort culture. The best path forward probably has little to do with specifically targeting individual clinicians, or even broad complaints like “chest pain”, but identifying the specific uncertainties upon which decisions are made. Then, evidence or tools may be generated to address the specific clinical questions giving rise to the variation.

“Variation in Emergency Department Physician Admitting Practices and Subsequent Mortality”
https://jamanetwork.com/journals/jamainternalmedicine/article-abstract/2828189

Mobile Stroke Unit Propaganda Writ Large

This is yet another one of those “Get With The Guidelines” stroke analyses, a retrospective dredge with massive imbalances between groups – followed by statistical adjustments capable of turning out whichever result suits an author list with a full, dense printed page of pharma and stroke technology conflicts of interest.

In that respect, the study is unremarkable. Patients with potential stroke who were transported by Mobile Stroke Units were more likely to be functionally independent at baseline and more likely to be transported to a comprehensive stroke center. Thus, patients transported by Mobile Stroke Unit were more likely to be ambulatory and functionally independent at hospital discharge. Everything between the intake and output is just diversions.

Where it becomes further disagreeable is the accompanying editorial, written by two individuals who run Mobile Stroke Unit programs, arguing federal reimbursement ought to cover their pet projects. After a brief brush with the limitations of these data, they assert:

“it convincingly demonstrates through a large, representative, multicenter study that in real-world clinical practice, MSUs are associated with improved short-term patient outcomes”
… quite the over-glamourization of a secondary analysis of quality improvement registry data.

“the magnitude of benefit conferred by MSUs is comparable to that of other widely accepted acute stroke interventions, such as IVT in a 3-hour to 4.5-hour window and specialized stroke units”
… after multiple statistical adjustments of a grossly imbalanced cohort.

“this study demonstrates that MSUs not only benefit patients with AIS eligible for IVT, but also patients with AIS who are ineligible for IVT and patients with other forms of stroke”
… so, even if the MSU – whose mission in life is to provide tip-of-the-spear IVT – doesn’t provide acute treatment, it still confers benefit due to its soothing glow?

“This may be explained by faster imaging and blood pressure control in patients with intracerebral hemorrhage.”
… admission blood pressure for patients with SAH in this cohort was identical between MSU and EMS.

“this study rebuts concerns that by reaching and treating patients with suspected stroke earlier in their clinical course, MSUs could lead to unnecessary IVT treatments and higher rates of hemorrhagic complications. In fact, this study demonstrated the opposite: MSU care was associated with lower rates of stroke mimics”
… yes, as is the typical approach to coding these data, early administration of IVT virtually dictates a patient be coded. Once a patient has received IVT, only strong evidence to the contrary permits consideration of alternative causes of transient neurologic dysfunction – a happy accident also precluding any sICH occurring in “stroke mimics”, because there are none. To wit: only 24 of 4,218 (0.56%) of all MSU responses were “stroke mimics”, whereas 2,114 of 104,466 (2.0%) of all EMS responses were stroke mimics. When all you have is a hammer, everything you see looks like a stroke.

“Furthermore, for the broader population presenting with suspected stroke regardless of final diagnosis, the data suggest the potential for a lower risk of death.”
Again, this is magical thinking. As above, observing benefits outside the scope of the capabilities of an MSU ought prompt reconsideration statistical adjustments rather than plaudits.

These data are simply unsuited to support this sort of unabashed enthusiasm for MSUs. Rather than this editorial supporting their argument to consider funding and reimbursement structures for these tools, their biases shine through to diminish it. Regrettably, as per usual, guidelines and policy will be made by those sponsored to make the most persuasive contortion of data, rather than the most accurate.

“Mobile Stroke Unit Management in Patients With Acute Ischemic Stroke Eligible for Intravenous Thrombolysis”
https://jamanetwork.com/journals/jamaneurology/fullarticle/2824954

“Mobile Stroke Units—Time for Legislation and Remuneration”
https://jamanetwork.com/journals/jamaneurology/fullarticle/2824955

The AI Will Literally See You Now

This AI study is a fun experiment claiming to replicate the clinical gestalt generated by a physician’s initial synthesis of visual information. The ability to rapidly assess the stability and acuity of a patient is part of every experienced clinician’s refined skills – and used as a pre-test anchor for application of further diagnostic and management reasoning.

So, can AI do the same thing?

Well, “yes” and “of course not”.

In this demonstration project, these authors set up a mobile phone video camera at the foot of patients’ beds in the emergency department. Patients were instructed to perform a series of simple tasks (touch your nose, answer questions, etc.) while being recorded. Then, AI models were trained off images from these videos to predict the likelihood of admission.

The authors performed four comparisons: AI video alone, AI video + triage information (vital signs, chief complaint, age), triage information alone, and emergency severity index (ESI). In this fun demonstration, all four models were basically terrible at predicting admission (AUROCs ~0.6-0.7). But, the models incorporating video basically held their own, clearly outperforming ESI, and video + triage information was incrementally better than triage information alone.

There is very clearly nothing here suggesting this model is remotely clinical useful, or that it somehow parallels the cognitive processes of an experienced clinician. It is solely an academic exercise, though describing it as such ought not minimize the novelty of incorporating image analysis with other clinical information. As has been previously seen with other image analysis, AI models frequently trigger off image features unrelated to the clinical aspects of a case. The k-fold cross-validation used on their limited sample of 723 patients likely overfits their predictive model to their training data, leading to artificial inflation of performance. Then, “admission to hospital”, while operationally interesting, is a poor surrogate for immediate clinical needs and overall acuity. Finally, the authors also note several ethical and privacy challenges around video capture in clinical settings.

Regardless, a clever contribution to the AI clinical prediction literature.

“Hospitalization prediction from the emergency department using computer vision AI with short patient video clips”
https://www.nature.com/articles/s41746-024-01375-3

Getting Triggered By Errors in the Emergency Department

The emergency department is a place of risk and errors. Those who work in the ED are acutely aware of this, and it conjures up tremendous cognitive pressures on staff every shift.

Every ED clinician knows the most benign-appearing triage complaint may obfuscate lurking catastrophe. The vision changes that are actually an acute aortic dissection. A sore shoulder that is necrotizing fasciitis. The list goes on. If some are to be believed, hundreds of thousands are being killed each year by diagnostic errors in the ED. The reality is much lower, but still nontrivial.

But, the net effect becomes – the ED is a focus for patient safety research. In modern parlance, “diagnostic errors” become “missed opportunities for diagnosis” (MODs), and well-meaning researchers are devising further methods to shine bright lights upon our inadequacies.

This most recent publication looks at “e-Triggers” – effectively, combinations of both patient features and patient outcomes meant to retrospectively identify cohorts in which substantial numbers of patients can be found to have MODs. For example, in this paper, the authors use an “e-Trigger” modelled around posterior circulation stroke – in which the data warehouse is queried for elderly patients presenting with dizziness, at least two cerebrovascular risk factors, and whom, after initial discharge from the ED, suffered a stroke within 30 days.

When the authors dredged 8M records from the Veterans Affairs system for this, they identified 203 such instances, and manually reviewed 100 of these using a structured framework to characterize any diagnostic error present. For this “stroke” example, 47 of the 100 patients reviewed were identified to have had MODs. Per the review of records, the most common missed opportunity stemmed from inadequate physical examination and insufficient ordering of diagnostic tests. As a result, most of the patients reviewed suffered moderate or severe harms as a result of these MODs.

There is good news and bad news from this “e-Trigger” method shown here. The good news is primarily of interest to patient safety researchers, indicating this is probably a reasonable method to use for enriching populations for review to further describe the types of error occurring in specific clinical scenarios. This could lead to identification of generalizable knowledge gaps, cognitive biases, or system factors. It is also, probably, too unwieldy and labor intensive for routine punitive use targeting individual clinicians.

The bad news is primarily patient-centered. The fundamental nature of the e-Trigger structure requires a pairing of a cohort at risk and a subsequent unfortunate outcome. Thus, the harm has already reached the patient. It seems plausible suitably high-risk cohorts could be determined relatively contemporaneously, but the challenge would be finding a mechanism to detect a MOD with sufficient specificity to be deployable in clinical workflow. However, with the ability to potentially replace some previously human review steps with AI, this idea may be imminently achievable – watch this space!

“Implementation of Electronic Triggers to Identify Diagnostic Errors in Emergency Departments”
https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2827341

WOMAN-2: What Does the Robot Say?

Following on the success of Toy Story 2, Inside Out 2, and Avatar 2, we have WOMAN-2, yet another trial featuring emergency medicine’s third-favorite medication: tranexamic acid. However, where those sequels succeeded, WOMAN-2 is more like Miss Congeniality 2 – the one we’re not going to talk about again.

But, don’t take it from me – take it from the ChatGPT Agent I created!

The WOMAN-2 trial serves as a benchmark for evaluating interventions targeting postpartum hemorrhage (PPH) in women with anemia, but it underscores the challenges of translating promising pharmacological hypotheses into clinical practice. The study meticulously implemented randomization and blinding protocols, ensuring that biases from enrolment and allocation processes were minimized. Baseline characteristics were well-matched, providing a solid foundation for evaluating the intervention’s effect.

However, the trial’s design reveals critical limitations that may have compromised its ability to detect meaningful clinical benefits. The reliance on clinical judgment rather than standardized measurement tools for diagnosing PPH introduces variability, potentially diluting the observed treatment effect. Furthermore, the timing of tranexamic acid administration—after cord clamping—raises questions about its pharmacological plausibility in rapidly evolving hemorrhagic scenarios. Subgroup analyses suggest the intervention’s effect may differ in high-risk populations, such as those with antepartum hemorrhage, but the evidence is inconclusive.

The null findings for both primary and secondary outcomes should prompt reflection on trial methodology and clinical priorities. The absence of adverse events is reassuring, yet the lack of demonstrable efficacy highlights the complexity of addressing PPH in anemic populations. A focus on earlier interventions and systemic prevention of anemia may hold greater promise. Additionally, future trials should prioritize precision in outcome measurement and incorporate emerging technologies to minimize subjective assessment variability.

While the study’s funding sources were disclosed transparently, the authors’ broader involvement in tranexamic acid research warrants attention to ensure interpretations remain unbiased. Overall, the WOMAN-2 trial reinforces the urgency of optimizing anemia management and refining therapeutic strategies for PPH, underscoring the need for innovation in both clinical practice and research methodologies.

Well, there you have it – unbiased evaluation by the robot, featuring that bland robotic voice common to all its very average, very “correct” output. Interestingly, it can be trained and/or instructed to copy your writing “style”, and the output is grossly similar – but with an added layer of tryhard treacle slathered upon it.

In my brief experimentations with the Agent, it seems clear the augmentation feasible does not include writing – at least, enjoyable writing. It is superficially very competent at enumerating questions from a template, however, such as study population, primary outcomes, and specific sources of bias. For example, this agent actually executes the ROB2 questionnaire on an RCT before using that output as the foundation for its summary paragraphs. Probably good enough to give an “at a glance” summarization, but not nearly sufficient to put the research into context.

Agent aside, we’re here because WOMAN-2 is the sequel, obviously, to WOMAN – a “positive” trial, also “negative”. In WOMAN, it was positive for the endpoint of post-partum hemorrhage and death due to bleeding, but negative for the patient-oriented outcomes of overall mortality. Here in WOMAN-2, the small effect size previously seen in WOMAN has entirely vanished, leading to further questions. Where TXA seems to be most effective are instances in which it is given early – and subsequent trials “I’M Woman” and “WOMAN-3” will address these possibilities. The other possibility is, such as with gastrointestinal bleeding, certain clinical scenarios feature specific fibrinolytic activation pathways where the mild effect of TXA simply can’t move the needle.

So, nothing here changes what most of us do in the modern world – and those who have Bayesian ideas regarding the efficacy of TXA are likely going to keep using it in sub-Saharan Africa. If you are going to keep using TXA routinely, use it early and in the highest-risk populations – as the likelihood of a clinically meaningful benefit will otherwise disappear like a whisper in the wind.

“The effect of tranexamic acid on postpartum bleeding in women with moderate and severe anaemia (WOMAN-2): an international, randomised, double-blind, placebo-controlled trial”

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(24)01749-5/fulltext

What are Children’s Lives Worth (to Save)?

This article regarding the cost of upgrading emergency departments to be “ready” for sick children has been bouncing around in the background since its publication, with some initial lay press coverage.

The general concept here is obviously laudable and the culmination of at least a decade of hard work from these authors and the team involved – with the ultimate goal of ensuring each emergency department in the country is capable of caring for critically unwell children. The gist of this most recent publication builds upon their prior work to, effectively, estimate the overall cost (~$200M) of improving “pediatric readiness”. Using that total cost, they then translate this into humanizing terms by referencing the total cost per child it might require in different states, and the number of pediatric lives saved annually.

As can be readily gleaned from this sort of thought experiment, these estimates rely upon a nested set of foundational assumptions, all of which are touched upon by prior work by this group. There are surveys of subsets of emergency departments regarding “readiness“, which involve questions such as the presence of pediatric-sized airway devices and staff dedicated to upkeep of various pediatric support. Then, they use these data and salary estimates to come up with the institutional costs of readiness. Then, they have another set of work looking at the odds ratios for increased poor outcomes at departments whose “readiness” is in the lowest percentiles, and this work is extrapolated to determine the lives saved.

Each of these pieces of work, in isolation, is reasonable, but represents a bit of a house of cards. The likelihood of imprecision is magnified as the estimates are combined. For example, how direct is the correlation between “readiness” based on certain equipment and pediatric survival, if the ED in question is a critical access hospital with low annual census? Is the cost of true clinical readiness just a part-time FTE of a nurse, or should it realistically involve the costs of skill upkeep for nurses and physicians with education or simulation?

I suspect, overall, these data understate the costs and overstate the return on investment. That said, this is still critical work even just to describe the landscape and take a stab at the scope of funding required. Likely, the best next step would be to target specific profiles of institutions, and specific types of investment, where such investment is likely to have the highest yield – as a first step on the journey towards universal readiness.

“State and National Estimates of the Cost of Emergency Department Pediatric Readiness and Lives Saved”
https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825748