The Elephant in the PECARN/CHALICE/CATCH Room

A few months ago, I wrote about the main publication from this study group – a publication in The Lancet detailing a robust performance comparison between the major pediatric head injury decision instruments. Reading between the lines, as I mentioned then, it seemed as though the important unaddressed result was how well physician judgment performed – only 8.3% of the entire cohort underwent CT.

This, then, is the follow-up publication in Annals of Emergency Medicine focusing on the superiority of physician judgment. Just to recap, this study assessed 18,913 patients assessed to have had a mild head injury. Of these, 160 had a clinically important traumatic brain injury and 24 underwent neurosurgery. The diagnostic performance of these decision instruments is better detailed in the other article but, briefly, for ciTBI:

  • PECARN – ~99% sensitive, 52 to 59.1% specific
  • CHALICE – 92.5% sensitive, 78.6% specific
  • CATCH – 92.5% sensitive, 70.4% specific

These rules, given their specificity, would commit patients to CT scan rates of 20-30% in the case of CHALICE and CATCH, and then an observation or CT rate of ~40% for PECARN. But how did physician judgment perform?

  • Physicians – 98.8% sensitive, 92.4% specific

Which is to say, physicians missed two injuries – each detected a week later in follow-up for persistent headaches – but only performed CTs in 8.3% of the population. As I highlighted in this past month’s ACEPNow, clinical decision instruments are frequently placed on a pedestal based on their own performance characteristics in a vacuum, and rarely compared with clinician judgment – and, frequently, clinician judgment is as good or better. It’s fair to say these head injury decision instruments, depending on the prevalence of injury and the background level of advance imaging, may actually be of little value.

“Accuracy of Clinician Practice Compared With Three Head Injury Decision Rules in Children: A Prospective Cohort Study”
http://www.annemergmed.com/article/S0196-0644(18)30028-3/fulltext

Using PERC & Sending Home Pulmonary Emboli For Fun and Profit

The Pulmonary Embolism Rule-Out Criteria have been both lauded and maligned, depending on which day the literature is perused. There are case reports of large emboli in patients who are PERC-negative, as well as reports of PE prevalence as high as 5% – in contrast to its derivation meeting the stated point of equipoise at <1.8%. So, the goal here is to be the prospective trial to end all trials and most accurately describe the impact of PERC on practice and outcomes.

This is a cluster-randomized trial across 14 Emergency Departments across France.  Centers were randomized to either a PERC-based work-up strategy for PE, or “conventional” in which virtually every patient considered for PE was tested using D-dimer. Interestingly, these 14 centers also crossed-over to the alternative algorithm approximately halfway through the study period, so every ED was exposed to both interventions – some of which used PERC first, and vice versa.

Overall, they recruited 1,916 patients across the two enrollment periods, and these authors focused on the 1,749 who received per-protocol testing and were not lost to follow-up. The primary outcome was any new diagnosis of venous thromboembolism at 3 month follow-up.  This was their measure of, essentially, clinically important missed VTE upon exiting their algorithm. The headline results here were, in their per-protocol population, that 1 patient was diagnosed with VTE in follow-up in the PERC group compared with none in the control cohort. This met their criteria for non-inferiority, and, just at face value, the PERC-based strategy is clearly reasonable. There were 48 patients lost to follow-up, however, but given the overall prevalence of PE in this population, it is unlikely these lost patients would have affected the overall results.

There are a few interesting bits to work through from the characteristics of the study cohort. The vast majority of patients considered for the diagnosis of PE were “low risk” by either Wells or simplified Revised Geneva Score. However, 91% of those in the PERC cohorts were “low risk”, as compared to 78% in the control cohort – which, considering the structure of this trial, seems unlikely to have occurred by chance alone. In the PERC cohort, about half failed to meet PERC and these patients – plus a few protocol violations – moved forward with D-dimer testing. In the conventional cohort, 99% were tested with D-dimer in accordance with their algorithm.

There were then, again, more odd descriptive results at this point.  The results of the D-dimer testing (≥0.5 µg/mL) were positive in 343 of the PERC cohort and 471 of the controls. However, physicians only moved forward with CTPA in 38% of the PERC cohort and 46% of the conventional cohort.  It is left entirely unaddressed why patients entered a PE rule-out pathway and ultimately never received a definitive imaging test after a D-dimer above threshold. For what it’s worth, then, the fewer patients undergoing evaluation for PE in the PERC cohort led to fewer diagnoses of PE, fewer downstream hospital admissions and anticoagulants, and their ED length of stay was shorter. The absolute numbers are small, but patients in the control cohort undergoing CTPA were more likely to have subsegmental PEs (5 vs. 1), which, again, ought to generally make sense.

So, finally, what is the takeaway here? Should you use a PERC-based strategy? As usual, the answer is: it depends. Firstly, it is almost certainly the case the PERC-based algorithm is safe to use. Then, if your current approach is to carpet bomb everyone with D-dimer and act upon it, yes, you may see dramatic improvements in ED processes and resource utilization. However, as we see here, the prevalence of PE is so low, strict adherence to a PERC-based algorithm is still too clinically conservative. Many elevated D-dimers did not undergo CTPA in this study – and, with three month follow-up, they obviously did fine. Frankly, given the shifting gestalt relating to the work-up of PE, the best cut-off is probably not PERC, but simply stopping the work-up of most patients not intermediate- or high-risk.

“Effect of the Pulmonary Embolism Rule-Out Criteria on Subsequent Thromboembolic Events Among Low-Risk Emergency Department Patients: The PROPER Randomized Clinical Trial”
https://jamanetwork.com/journals/jama/fullarticle/2672630

EDACS vs. HEART – But Why?

The world has been obsessed over the past few years with the novelty of clinical decision rules for the early discharge of chest pain. After several years of battering the repurposed Thrombolysis in Myocardial Infarction (TIMI) score, History, Electrocardiogram, Age, Risk factors and Troponin (HEART) became ascendant, but there are several other candidates out there.

One of these is Emergency Department Assessment of Chest pain Score (EDACS), which is less well-known, but has reasonable face validity.  It does a good job identifying a “low-risk” cohort, but is more complicated than HEART. There is also a simplified version of EDACS that goes ahead and eliminates some of the complicated subtractive elements of the score. This study pits these various scores head-to-head in the context of conventional troponin testing, as well.

This is a retrospective review of 118,822 patients presenting to Kaiser Northern California Emergency Departments, narrowing the cohort to those whose initial Emergency Department evaluation was negative for acute coronary syndrome. The 60-day MACE (composite of myocardial infarction, cardiogenic shock, cardiac arrest, and all-cause mortality) in this cohort was 1.9%, most of which were acute MI. Interestingly, these authors chose to present only the negative predictive value of their test characteristics, which means – considering such low prevalence – the ultimate rate of MACE in all the low-risk cohorts defined by each decision instrument were virtually identical. Negative predictive values of all three scores depended primarily on the troponin cut-off used, and were ~99.2% for ≤0.04 ng/mL, and ~99.5% for ≤0.02 ng/mL. The largest low-risk cohort by definition was with the original EDACS rule, exceeding the HEART score classification by an absolute quantity of about 10% of the total cohort, regardless of the troponin cut-off used.

The editorial accompanying the article goes on to laud these data as supporting the use of these tools for early discharge from the Emergency Department. However, this is an outdated viewpoint, particularly considering the data showing early non-invasive evaluations are of uncertain value. In reality, virtually all patients who have been ruled-out for ACS in the ED can be discharged home, regardless of risk of MACE. The value of these scores is probably less so in determining who can be discharged, but rather in helping triage patients for closer primary care or specialist follow-up.  Then, individualized plans can be developed for optimal medical management, or for assessment of the adequacy of the coronary circulation, to prevent what MACE is feasible to be prevented.

“Performance of Coronary Risk Scores Among Patients With Chest Pain in the Emergency Department”
http://www.onlinejacc.org/content/71/6/606

“Evaluating Chest Pain in the Emergency Department: Searching for the Optimal Gatekeeper.”
http://www.onlinejacc.org/content/71/6/617

Would You Use A Syncope SDM Instrument?

Much has been made, off and on, about the chest pain shared decision-making tool rolled out over the past couple years. It turns out, when properly informed of their low risk for subsequent cardiac events, most patients look at you sideways and wonder why anyone was offering them admission in the first place.  Whether that was its intended purpose, or a happy little accident, is a subject of controversy.

Their next target: syncope.

The content of this article is not very profound, other than to show the first step in the process of developing such an SDM instrument. These authors detail their involvement of emergency physicians, cardiologists, and patient stakeholders to inform their iterative design process. In the end, their tool looks a lot like the their chest pain instrument:

Generally speaking, because the approach to low-risk syncope has some of the same issues as low-risk chest pain, I have essentially the same fundamental problems. Much like for chest pain, inpatient evaluations for syncope are generally unrevealing. We probably ought not be admitting most of these patients. Therefore, this SDM instrument is again addressing the problem of low-value resource utilization by shifting the burden of the decision onto the patient, and trying to convince them to make what we already know to be the correct one (go home). That’s not how the Force works.

Then, just like the chest pain tool, this fails to convey the benefit of hospitalization for comparison. In their pictogram, two out of 100 patients suffer an adverse event after fainting. Is admission to the hospital protective against those adverse events – even if a diagnosis is made? The patient needs to receive some simplified visualization of their expected benefit from staying in the hospital, not just simply the base rate for deterioration.

I love shared decision-making. I use it constantly in my practice in situations where the next step in evaluation or treatment has no clearly superior path. Again, I don’t think this reflects the same uncertainty.

“Development of a Patient Decision Aid for Syncope in the Emergency Department: the SynDA tool”

https://www.ncbi.nlm.nih.gov/pubmed/29288554

It’s SAH Silly Season Again!

A blustery, relentless wind is blowing the last brittle teeth from the trees here in late November – which must mean it’s time to descend, yet again, into decision-instrument madness. Today’s candidate/culprit:

The Ottawa Subarachnoid Rule, once-derived, now-validated in this most recent publication highlighting their prospective, observational, multi-center follow-up to the original. The components are as you see above, and the cohort eligible for inclusion were neurologically intact “adult patients with nontraumatic headache that had reached maximal intensity within 1 hour of onset”. Over four years, six Canadian hospitals, and a combined annual census of 365,000, these authors identified 1,743 eligible patients with headache, 1,153 of whom consented to study inclusion and follow-up. Of these, there were 67 patients ultimately diagnosed with SAH, and the Rule picked up all of them for a sensitivity of 100% (95% CI 94.6% to 100%) – and a specificity of 13.6% (95 CI 13.1% to 15.8%).

Unfortunately, take the infographic above and burn it, because, frankly, their route to 100% sensitivity is, essentially: everyone needs evaluation.  This can be reasonable when the disease is life-threatening, such as this, but the specificity is so poor in a population with such a low prevalence the rate of evaluation becomes absurd.

If their rule had been followed in this cohort, the rate of investigation would have been 84.3% – or, 972 patients evaluated in order to pick up the 67 positives. Then, in the context of usual practice in this cohort, the investigation rate was 89.0%. That means, over the course of 4 years in these six hospitals, use of this decision instrument would have saved 1 fewer patient from an investigation for SAH every six months. However, the hospitals included for this validation were also the same ones who assisted in the derivation, meaning their practice was likely already based around the rule. I expect, in most settings, this decision instrument will increase the rate of investigation – and do so without substantially improving sensitivity.

Furthermore, their definition also includes patients with a diagnosis of non-aneurysmal SAH who did not undergo intervention, a cohort in whom the diagnosis is of uncertain clinical significance.  If only those with aneurysms and morbidity/mortality-preventing interventions were included, the prevalence of disease would be even lower.  We would then be looking at even fewer true positives  for all this resource expenditure.

The other issue with a rule in which ~85% of patients undergo investigation for headache is the indication creep that may occur when physicians apply the rule outside the inclusion criteria for this study. The prevalence of SAH here is very high compared with the typical ED population presenting with headache. If less strict inclusion criteria are used, the net effect willy likely be to increase low-value investigations in the overall population. Dissemination of this decision instrument and the downstream application to other severe headaches in the ED will likely further degrade the overall appropriateness of care.

Finally, just as a matter of principle, the information graphic is inappropriate because it implies a mandated course of medical practice. No decision instrument should ever promote itself as a replacement for clinical judgment.

“Validation of the Ottawa Subarachnoid Hemorrhage Rule in patients with acute headache”

http://www.cmaj.ca/content/189/45/E1379.abstract

It’s Sepsis-Harassment!

The computer knows all in modern medicine. The electronic health record is the new Big Brother, all-seeing, never un-seeing. And it sees “sepsis” – a lot.

This is a report on the downstream effects of an electronic sepsis alert system at an academic medical center. Their sepsis alert system was based loosely on the systemic inflammatory response syndrome for the initial warning to nursing staff, followed by additional alerts triggered by hypotension or elevated lactate. These alerts prompted use of sepsis order sets or triggering of internal “sepsis alert” protocols. Their outcomes of interest in their analysis were length-of-stay and in-hospital mortality.

At first glance, the alert appears to be a success – length of stay dropped from 10.1 days to 8.6, and in-hospital mortality from 8.5% to 7.0%. It would have been quite simple to stop there and trumpet these results as favoring the alerts, but the additional analyses performed by these authors demonstrate otherwise. In the case of both length-of-stay and mortality, both of those measures were trending downward independently regardless of the intervention, and in their adjusted analyses, none of the improvements could be conclusively tied to the sepsis alerts – and some relating to diagnoses of less-severe cases of sepsis probably prompted by the alert itself.

What is not debatable, however, is the burden on clinicians and staff. During their ~2.5 year study period, the sepsis alerts were triggered 97,216 times – 14,207 of which in the 2,144 subsequently receiving a final diagnosis of sepsis. The SIRS-based alerts comprised most (83,385) of these alerts, but only captured 73% of those with an ultimate diagnosis of sepsis, while having only a 13% true positive rate. The authors’ conclusion gets it right:

Our results suggest that more sophisticated approaches to early identification of sepsis patients are needed to consistently improve patient outcomes.

“Impact of an emergency department electronic sepsis surveillance system on patient mortality and length of stay”
https://academic.oup.com/jamia/article-abstract/doi/10.1093/jamia/ocx072/4096536/Impact-of-an-emergency-department-electronic

Predicting Poor Outcomes After Syncope

Syncope is a classic good news/bad news presenting complaint. It can be highly distressing to patients and family members, but rarely does it relate to an acutely serious underlying cause. That’s the good news. The bad news, however, is that for those with the worst prognosis, most of the poor prognostic features are unmodifiable.

This is a prospective, observational study of patients presenting with syncope to Emergency Departments in Canada, with the stated goal of developing a risk model for poor outcomes after syncope. The composite outcome of interest was death, arrhythmia, or interventions to treat arrhythmias within 30 days of ED disposition. Follow-up was performed by structured telephone interview, networked hospital record review, and Coroner’s Office record search.

To achieve a lower bound of the 95% confidence interval for sensitivity of 96.4%, these authors targeted a sample size of 5,000 patients, and ultimately enrolled 5,010 with complete outcome assessments. The mean age was 53.4, had a low incidence of comorbid medical conditions, and only 9.5% were admitted to the hospital. Within 30 days, 22 had died, 15 from unknown causes and the others from the pool of 91 patients diagnosed with a “serious arrhythmia” – sinus node dysfunction, atrial fibrillation, AV block, ventricular arrhythmia, supraventricular tachycardia, or requiring a pacemaker insertion.

These authors ride the standard merry-go-round of statistical analysis, bootstrapping, and logistic regression to determine a prediction rule – the Canadian Syncope Arrhythmia Risk Score – an eight element additive and subtractive scoring system to stratify patients into one of eleven expected risk categories. They report the test characteristics of their proposed clinically useful threshold, greater than 0, to be a sensitivity of 97.1% and a specificity of 53.4% – a weak positive predictive value of 4.4% considering the low incidence of the composite outcome.

This is yet another product of obviously excellent work from the risk model machines in Canada, but, again, of uncertain clinical value. The elements of the risk model are frankly those that are quite obvious: elevated troponin and conduction delays on EKG, along with an absence of classic vasovagal features. These are patients whose cardiac function is obviously impaired, but short a time machine to go back and fix those hearts before they became sick, it’s a bit difficult to see the path forward. These authors feel their prediction rule aids in safe discharge of patients with syncope, although these patients are already infrequently admitted to the hospital in Canada. The various members of their composite outcome are not equally serious, preventable, or treatable, limiting the potential management options for even those falling into their high-risk group.

As with any decision instrument, its value remains uncertain until it is demonstrated the clinical decisions supplemented by this rule lead to better patient-oriented outcomes and/or resource utilization than our current management in this cohort.

“Predicting Short-Term Risk of Arrhythmia among Patients with Syncope: The Canadian Syncope Arrhythmia Risk Score”

https://www.ncbi.nlm.nih.gov/pubmed/28791782

Let’s Get Together and Ignore PERC

The “Pulmonary Embolism Rule-Out Criteria” does not, as it implies, “rule out” PE.  It does, however, generally carve out a cohort for whom objective testing may be obviated, with the implication the costs and harms from false-positives and from anticoagulation outweigh the morbidity from missed PE. It is fairly well popularized and incorporated into guidelines for PE – and, well, at the least, physicians in an academic center, on the cutting edge of medical knowledge and education, should be applying appropriately.

Or not.

This is a prospective study enrolling undifferentiated Emergency Department patients with chest pain and shortness of breath. Research staff approached patients with these general chief complaints and collected the baseline variables needed for PERC, Wells, and other baseline clinical and historical data.  They collected data on 3,204 patients, 17.5% of whom were PERC-negative. Of these, 25.5% underwent some testing for pulmonary embolism – inclusive of D-dimer, CTPA, or V/Q scanning. Then, two – 0.4% – PERC-negative patients were ultimately diagnosed with a PE. The authors also present comparative data for the PERC-positive population, with the expected higher-frequency of testing and diagnosis associated with the absence of low-risk features.

PERC is, of course, an imperfect tool, an unavoidable consequence of any decision instrument narrowing a complex clinical decision down to a handful of variables. But, at the least, patients meeting PERC ought nearly all fall into the bucket of “why were you really considering PE in the first place?”, with few exceptions. For nearly a quarter of these to start down the rabbit hole of testing for PE is low-value and harmful medical practice at a population level, regardless of the potential magnitude of individual benefit for those true positives ultimately identified.

AOr, more concisely, this is nuts.

“Pulmonary Embolism Testing among Emergency Department Patients who are Pulmonary Embolism Rule-out Criteria Negative”

http://onlinelibrary.wiley.com/doi/10.1111/acem.13270/full

Is The Road to Hell Paved With D-Dimers?

Ah, D-dimers, the exposed crosslink fragments resulting from the cleaving of fibrin mesh by plasmin. They predict everything – and nothing, with poor positive likelihood ratios for scads of pathologic diagnoses, and limited negative likelihood ratios for others.  Little wonder, then, routine D-dimer assays were part of the PESIT trial taking the diagnosis of syncope off the rails. Now, does the YEARS study threaten to make a similar kludge out of the diagnosis of pulmonary embolism?

On the surface, this looks like a promising study. We are certainly inefficient at the diagnosis of PE. Yield for CTPA in the U.S. is typically below 10%, and some of these diagnoses are likely insubstantial enough to be false positives. This study implements a standardized protocol for the evaluation of possible PE, termed the YEARS algorithm. All patients with possible PE are tested using D-dimer. Patients are also risk-stratified for pretest likelihood of PE by three elements: clinical signs of deep vein thrombosis, hemoptysis, or “pulmonary embolism the most likely diagnosis”. Patients with none of those “high risk” elements use a D-dimer cut-off of 1000 ng/mL to determine whether they proceed to CTPA or not. If a patient has one of more high-risk features, a traditional D-dimer cut-off of 500 ng/mL is used. Of note, this study was initiated prior to age-adjusted D-dimer becoming commonplace.

Without going into interminable detail regarding their results, their strategy works. Patients ruled out solely by the the D-dimer component of this algorithm had similar 3 month event rates to those ruled out following a negative CTPA. Their strategy, per their discussion, reduces the proportion managed without CTPA by 14% over a Wells’-based strategy (CTPA in 52% per-protocol, compared to 66% based on Wells’) – although less-so against Wells’ plus age-adjusted D-dimer. Final yield for PE per-protocol with YEARS was 29%, which is at the top end of the range for European cohorts and far superior, of course, to most U.S. practice.

There are a few traps here. Interestingly, physicians were not blinded to the D-dimer result when they assigned the YEARS risk-stratification items. Considering the subjectivity of the “most likely” component, foreknowledge of this result and subsequent testing assignment could easily influence the clinician’s risk assessment classification. The “most likely” component also has a great deal of inter-physician and general cultural variation that may effect the performance of this rule. The prevalence of PE in all patients considered for the diagnosis was 14% – a little lower than the average of most European populations considered for PE, but easily twice as high as those considered for possible PE in the U.S. It would be quite difficult to generalize any precise effect size from this study to such disparate settings. Finally, considering the D-dimer assay continuous likelihood ratios, we know the +LR for a test result of 1000 ± ~500 is probably around 1. This suggests using a cut-off of 1000 may hinge a fair bit of management on a test result representing zero informational value.

This ultimately seems as though the algorithm might have grown out of a need to solve a problem of their own creation – too many potentially actionable D-dimer results being produced from an indiscriminate triage-ordering practice. I remain a little wary the effect of poisoning clinical judgment with the D-dimer result, and expect it confounds the overall generalizability of this study. As robust as this trial was, I would still recommend waiting for additional prospective validation prior to adoption.

“Simplified diagnostic management of suspected pulmonary embolism (the YEARS study): a prospective, multicentre, cohort study”
http://thelancet.com/journals/lancet/article/PIIS0140-6736(17)30885-1/fulltext

PECARN, CATCH, CHALICE … or None of the Above?

The decision instrument used to determine the need for neuroimaging in minor head trauma essentially a question of location. If you’re in the U.S., the guidelines feature PECARN. In Canada, CATCH. In the U.K., CHALICE. But, there’s a whole big world out there – what ought they use?

This is a prospective observational study from two countries out in that big remainder of the world – Australia and New Zealand. Over approximately 3.5 years, these authors enrolled patients with non-trivial mild head injuries (GCS 13-15) and tabulated various rule criteria and outcomes. Each rule has slightly different entry criteria and purpose, but over the course of the study, 20,317 patients were gathered for their comparative analysis.

And, the winner … is Australian and New Zealand general practice. Of these 20,000 patients included, only 2,106 (10%) underwent CT. It is hard to read between the lines and determine how many of the injuries included in this analysis were missed on the initial presentation, but if rate of neuroimaging is the simplest criteria for winning, there’s no competition. Applying CHALICE to their analysis cohort would have increased their CT rate to approximately 22%, and CATCH would raise the rate to 30.2%. Application of PECARN would place 46% of the cohort into CT vs. observation – an uncertain range, but certainly higher than 10%.

Regardless, in their stated comparison, the true winner depends on the value-weighting of sensitivity and resource utilization. PECARN approached 100% or 99% sensitivity, missing only 1 patient with clinically important traumatic brain injury out of ~10,000. Contrawise, CATCH and CHALICE missed 13 and 12 out of ~13,000 and ~14,000, respectively. Most of these did not undergo neurosurgical intervention, but a couple missed by CHALICE and CATCH would. However, as noted above, PECARN is probably substantially less specific than both CATCH and CHALICE, which has relatively profound effect on utilization for a low-frequency outcome.

Ultimately, however, any of these decision instruments is usable – as a supplement to your clinical reasoning. Each of these rules simplifies a complex decision into one less so, with all its inherent weaknesses. Fewer than 1% of children with mild head injury need neurosurgical intervention and these are certainly rarely missed by any typical practice. In settings with high CT utilization rates, any one of these instruments will likely prove beneficial. In Australia and New Zealand – as well as many other places around the world – potentially not so much.  This is probably a fine example of the need to compare decision instruments to clinician gestalt.

“Accuracy of PECARN, CATCH, and CHALICE head injury decision rules in children: a prospective cohort study”

http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(17)30555-X/abstract