Bayesian Statistics: We’re Dumb as Rocks

A guest post by Justin Mazzillo, a community doc in New Hampshire.

Physicians are often required to interpret medical literature to make critical decisions on patient care. Given that it is often in a hectic and hurried environment a strong foundation of evidence-based medicine is paramount. Unfortunately, this study from JAMA showed that physicians at all levels of training have anything but that.

This group surveyed a mix of medical students, interns, residents, fellows, attending physicians and one retired physician. They were asked to answer the following question:

“If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming you know nothing about the person’s symptoms or signs?”

Unfortunately three-quarters of the subjects got the answer wrong. The results were consistent across all levels of training. The most commonly given answer was almost as far from correct as possible.

I’ve withheld the answer for those who want to try out the questions themselves, and I know all the dedicated EMLoN readers will fare much better.

“Medicine’s Uncomfortable Relationship With Math: Calculating Positive Predictive Value”
http://archinte.jamanetwork.com/article.aspx?articleid=1861033

[I gave this same test to our residency program – and the results were almost identical.  A few sample “answers” below. -Ryan]


It’s a Patient Hand-Off Miracle

Transitions of care – more frequent now in medicine than ever before – are fertile opportunities for error and miscommunication.  Most institutions have developed, at least, informal protocols to exchange patient information during hand-off.  But, certainly, everyone has some anecdotal tale of missed information leading to a near-miss or actual patient harms.

This study tells the story of I-PASS, a handoff bundle implemented and measured as an error prevention strategy by a pre- and post-intervention study design.  Across 9 pediatric residency training programs, residents were observed for six months for time spent in hand-offs, time spent in patient care, and a variety of classifications of preventable and non-preventable errors.  Then, the I-PASS bundle was introduced – a structured sign-out mnemonic, a 2-hour workshop on communication skills, a 1-hour role-playing and simulation intervention, a faculty development program, direct-observation tools, and a culture-change campaign with a logo, posters, and other promotional activities.

Following the intervention, residents were, again, observed for six-months.  And, in general, preventable medical errors decreased a small absolute amount, along with a larger absolute decrease in near misses.  2 of 9 hospitals had increases in medical errors after the interventions, and the bulk of the effect size was a result of improvements at two hospitals whose baseline error rate was double that of the other 7 facilities.

The authors, then, are very excited about their I-PASS bundle.  But, as they note at the end of their discussion: “Although bundling appears to have been effective in this instance, it prevents us from determining which elements of the intervention were most essential.”  And, on face validity, this is obvious – the structured sign-out sheet was only one of many quality improvement interventions occurring simultaneously.  A decisive change in culture will trump the minor components of implementation anytime.

The final takeaway: if your institutional audit reveals handoff-related errors are pervasive and troublesome, and if reductions in such errors are prioritized and supported with the correct resources, you will probably see a reduction.  The I-PASS tool itself is not important, but the principles demonstrated here probably are.

“Changes in Medical Errors after Implementation of a Handoff Program”
http://www.ncbi.nlm.nih.gov/pubmed/25372088

Scientific Writing is a Tragicomedy! Destroy!

Modern scientific writing – both in the exercises of writing and reading – is obtuse and uninviting.  Rather than clearly communicate an unbiased reflection of the conduct and findings of a particular study, the medical literature most commonly succeeds in doing the opposite.  After all, how else would I find enough to complain about on this blog?

This editorial elucidates so many joyfully preposterous notions it cannot help yet be loved.  It is best described as a no holds-barred cagematch versus all the inane pageantry of scientific writing.  Just a few of the gems, paraphrased:

  • Don’t let the authors write the abstract; they’ll just misrepresent the study!
  • Delete the introduction; uninsightful filler.
  • No one cares the brand and manufacturer of the statistical package used.
  • Unequal composite end-points and subgroup analyses should be banished.
  • The discussion section only serves authors’ purposes of dubious claims through selective reporting and biased interpretation of their results.

Some elements of this brief report are, indeed, novel.  Others are simply accepted best practices long since forgotten.  Regardless, it is a refreshing reminder of how brutally poorly the current medical literature serves effective knowledge translation.

“Ill communication: What’s wrong with the medical literature and how to fix it.”
http://www.ncbi.nlm.nih.gov/pubmed/25145940

When Is An Alarm Not An Alarm?

What is the sound of one hand clapping?  If a tree falls in a forest, does it make a sound?  If a healthcare alarm is in no fashion alarming, what judgement ought we make of its existence?

The authors of this study, from UCSF, compose a beautiful, concise introduction to their study, which I will simply reproduce, rather than unimpressively paraphrase:

“Physiologic monitors are plagued with alarms that create a cacophony of sounds and visual alerts causing ‘alarm fatigue’ which creates an unsafe patient environment because a life-threatening event may be missed in this milieu of sensory overload.“

We all, intuitively, know this to be true.  Even the musical mating call of the ventilator, the “life support” of the critically ill, barely raises us from our chairs until such sounds become insistent and sustained.  But, these authors quantified such sounds – and look upon such numbers, ye Mighty, and despair:

2,558,760 alarms on 461 adults over a 31-day study period.

Most alarms – 1,154,201 of them – were due to monitor detection of “arrhythmias”, with the remainder split between vital sign parameters and other technical alarms.  These authors note, in efforts to combat alert fatigue, audible alerts were already restricted to those considered clinically important – which reduced the overall burden to a mere 381,050 audible alarms, or, only 187 audible alarms per bed per day.

Of course, this is the ICU – many of these audible alarms may, in fact, have represented true positives.  And, many did – nearly 60% of the ventricular fibrillation alarms were true positives.  However, next up was asystole at 33% true positives, and it just goes downhill from there – with a mere 3.3% of the 1,299 reviewed ventricular bradycardia alarms classified as true positives.

Dramatic redesign of healthcare alarms is clearly necessary as not to detract from high-quality care.  Physicians are obviously tuning out vast oceans of alerts, alarms, and reminders – and some of them might even be important.

“Insights into the Problem of Alarm Fatigue with Physiologic Monitor Devices: A Comprehensive Observational Study of Consecutive Intensive Care Unit Patients”
http://www.ncbi.nlm.nih.gov/pubmed/25338067

ARISE, and Cast Off the Shackles of EGDT

The sound you hear is a sigh of relief from Emergency Physicians and intensivists regarding the outcomes of the Australasian Resuscitation in Sepsis Evaluation (ARISE).

As ProCESS suggested, and as many have suspected all along, it seemed the critical intervention from Early Goal-Directed Therapy was the early part – and less the SCO2 monitoring and active management of physiologic parameters using dobutamine and blood transfusion.  Now, we have a second study, in addition to ProCESS, supporting the same general conclusions.

ARISE enrolled patients with confirmed or suspected sepsis, and either hypotension refractory to 1L crystalloid fluid challenge or a lactate level of 4.0 mmol/L or more.  31 centers randomized 1,600 patients to undergo either EGDT or “usual care”, which entailed routine local clinical practice, excepting measurement of SCVO2 was forbidden.  EGDT, however, was provided by specially coordinated teams to ensure all patients received the intervention.  The primary outcome was death from any cause within 90 days, powered to detect an absolute risk-reduction of 7.6%.

Baseline characteristics between the two groups were quite similar, few patients dropped out of each arm, and, finally, there was no difference in the primary outcome – 18.6% vs. 18.8% (does it matter which is which?)  Indeed, of all the outcomes measured, only two differed in statistically significant fashion: the EGDT cohort departed the Emergency Department 30 minutes more quickly, and the EGDT cohort received greater vasopressor support – attributable entirely to the use of dobutamine in 15.4% of patients vs. 2.6% in the usual care arm.

As expected, resource utilization unique to EGDT, of course, was different – more and different types of central venous catheters, more arterial catheters, and more frequent use of blood products.  And, as we’re seeing – all of this is unnecessary.  As with ProCESS, “usual care” has become EGDT, excepting these elements.  Both groups received substantial, early crystalloid resuscitation, early appropriate antibiotic coverage, and departed the Emergency Department to a critical care setting quite quickly.

EGDT receives credit for making us aware the impact early identification and intervention can have on mortality.  However, it is time to leave EGDT behind and identify new resuscitation targets and sensible strategies for achieving them.

“Goal-Directed Resuscitation for Patients with Early Septic Shock”
http://www.nejm.org/doi/full/10.1056/NEJMoa1404380

Preposterous Inpatient Antibiotic Overuse

It is one matter entirely to give antibiotics for self-limited bacterial or viral conditions.  It is another matter to regularly, simultaneously prescribe multiple, redundant antibiotics with overlapping coverage, excepting a few particular situations.

And, we are clearly using overlapping coverage far more than just a few particular exceptions.

This review of proprietary data from inpatient antibiotic use in 505 U.S. hospitals between 2008 and 2011 looked at three types of antibiotic overlap – that for MRSA, for anaerobic bacteria, and for ß-lactam therapy.  These authors noted 32,507 cases in which patients received at least two consecutive days of redundant, simultaneous antibiotics.  The largest offender, by far:  82,018 days in which patients received both intravenous metronidazole and intravenous piperacillin-tazobactam.  The majority of the remainder were also anerobic overlap, and the authors also cited over a thousand cases each of dual-MRSA therapy or dual-ß-lactam therapy.

Now, there are certain tissues in which vancomycin has poor penetration, and vancomycin-intermediate strains are increasing – so it’s unreasonable to say all dual-MRSA therapy is inappropriate.  The same applies to the dual-ß-lactam therapy, as double-coverage for pseudomonas and other MDR pathogens frequently requires at least initial redundant therapy.  However, I think this data still reasonably reflects an abundance of opportunity to curtail inappropriate antibiotics use.

The authors, mostly employed by a health services consulting company, also try to do a cost-analysis to quantify the scope of the redundant use.  Unfortunately, in each case, they assume the more expensive antibiotic is the redundant one – which inflates and exaggerates their estimates.  Presumably, this comes out of the need to subsequently promote their company’s services, and these numbers are best ignored.

But, we can, at least, do much better than our present state of affairs.

“Economic Impact of Redundant Antimicrobial Therapy in US Hospitals”
http://www.ncbi.nlm.nih.gov/pubmed/25203175

Alas, Poor (Literally) Detroit

High-quality care can mean many things.  In the ideal sense, it conveys cooperation and coordination between the many facets of healthcare delivery – great physician care, the best possible translation of medical evidence to an individual patient, outstanding nursing, electronic systems that double-check and triple-check for safety, and an army of staff to support patients in all settings to ensure best possible health.

Or, high-quality care can just be a number.  A surrogate number, obliquely related to the ideals of quality care.  And the penalty for bad care can be monetary, a decrease in the level of reimbursement to a hospital.  Reimbursement, that, paradoxically, could be used to increase the quality of care.

And, so we see this paradox playing out in Detroit, as this letter in the NEJM points out.  Detroit, suffering publicly through bankruptcy and infrastructure collapse, simply does not have the resources to support public health.  Thus, its citizens, with poor access to preventative and primary care, and already burdened by the misery of the City, are forced to seek care – repeatedly – in its metropolitan and inner-city hospitals.  And, because of such repeated visits, these hospitals are subject to the Medicare Readmission penalty – and thus, fewer resources with which to care for those lacking adequate support to care for themselves outside the institutional setting.  And around we go again.

Are we truly measuring and encouraging quality?  Or are we punishing those systems who simply cannot afford to be further stressed?

“Medicare Readmission Penalties in Detroit”
http://www.ncbi.nlm.nih.gov/pubmed/25207786

Are a Third of Research Conclusions Wrong?

As I covered last year, half of what you’ve been taught in medicine is wrong – we just don’t know which half.

And, it turns out, sometimes even the same authors taking a second look at the same data as before, can come up with new – and wildly different – conclusions.

This is a review of 37 randomized-controlled trials published after 1966 paired with 37 “re-analyses” of the same data.  These trials span the entire medical domain, from mycophenolate therapy after cardiac transplantation to homeopathy for fibrosis.  Of these 37 re-analyses, 32 of them involved authors from the original research group.  These re-analyses differed by changing statistical techniques, outcome definitions, or other study interpretation methods.

Following re-analysis, 13 (35%) changed the original conclusions – either suggesting more, fewer, or even entirely different patients should be treated.  The implication regarding the reliability of our evidentiary basis for medical practice is obviously profound – if even just the original authors and data can result in conflicting conclusions.

In his editorial, Harlan Krumholz argues the solution is clear: open the data.  Independent verification of findings – whether by erasing bias or undesired mathematical pathology – is critical to ensuring the most complete understanding of the evidence base.  If our highest duty is to our patients, we must break down the barriers created by self-interest and institutional policies in order to promote data sharing – and serve patients by improving the clarity and transparency of medical practice.

“Reanalyses of Randomized Clinical Trial Data”
http://www.ncbi.nlm.nih.gov/pubmed/25203082

Appropriate Resource Utilization Can Be Taught!

At least, that is the implication of this paper – and even though it’s probably not the most reliable demonstration of such an effect, its observations are likely valid.

This is a very convoluted study design aiming to comment upon whether residents trained in “conservative” practice environments differed from residents trained in “aggressive” practice environments.  However, “conservative” and “aggressive” were defined by utilizing a Medicare database to calculate the “End-of-Life Visit Index”.  Residents trained in a region where elderly patients received greater frequency of inpatient and outpatient care at the end-of-life were judged to have trained in an “aggressive” environment.

Then, to measure whether residents themselves had tendencies towards “conservative” or “aggressive” management, the authors reviewed American Board of Internal Medicine board certification examination questions.  Questions regarding management strategy were divided into “conservative” or “aggressive” strategies, based on the correct answers.  Finally, examinees were measured on how many correct and incorrect answers were provided on these questions featuring the two management strategies.  Correlating these test answers with the end-of-life environment presumes to measure an association between training and practice.

After all these calisthenics – yes, residents training in the lower-intensity environments were more likely to perform better on the “conservative” management questions.  Thus, the authors make the expected extrapolation: trainees apparently learn to mimic inappropriately aggressive care.

This is probably true.  Whether this study – with its limitations and surrogates – adequately supports such conclusions is another matter entirely.

“The Association Between Residency Training and Internists’ Ability to Practice Conservatively”
http://www.ncbi.nlm.nih.gov/pubmed/25179515

Get to the Choppa! Or … Maybe Not?

Helicopter transport is entrenched in our systematic management of trauma.  It is glamorized on television, and retrospective National Trauma Data Bank studies seem to suggest survival improvement – and those with head injury seem to benefit most.

But, these NTDB studies encompass heterogenous populations and are challenged in creating truly equivalent control groups.  This study, on the other hand, is a single-center experience, allowing greater consistency across divided cohorts.  In a novel approach, these authors collected all HEMS trauma transfer requests to their facility across their 30-county catchement area – and specifically looked at occasions when weather precluded HEMS.  This therefore created two cohorts of patients eligible for HEMS, with a subset that was transported by ALS due to chance events.  The paramedic crews manning the HEMS and ALS transfers were staffed by the same company, and therefore had roughly equivalent training.

This created a cohort of 2,190 HEMS transports and 223 ground transports.  Across ISS, GCS, initial transfusion requirements, and vital signs, the two groups had generally minor differences.  However, there was some potentially important variability of initial operative intervention upon arrival at the Level 1 trauma center – 27.4% of HEMS underwent craniotomy, compared with 15.4% of ALS transfers.  Based on multivariable logistic regression, type of transport did not enter into a best fit model of survival – and, thus, there was no difference (9.0% vs 8.1% mortality) between HEMS or ALS transport of trauma patients, despite the additional hour added from call time to arrival at the Level 1 trauma center.

Unfortunately, there are potentially critical flaws in their methods for patient selection.  They report 3,901 patients had a request for trauma transfer – but the number of patients transferred by HEMS or ALS only sums to 2,398.  An additional 49 were transported by BLS.  Then, another 208 died while awaiting transfer.  How many of these 208 died during weather delays awaiting ALS?  Are those deaths, in some fashion, related to the paucity of craniotomies performed on ALS transports?  And, what of the other 965 patients?

I tend to agree with their conclusion – HEMS is expensive and far over-utilized for patients who receive no particular benefit from the time savings.  However, I’m not sure this analysis includes all the data needed to be reliable evidence.

“When birds can’t fly: An analysis of interfacility ground transport using advanced life support when helicopter emergency medical service is unavailable”
http://www.ncbi.nlm.nih.gov/pubmed/25058262