All the Pregnant Men in the EHR

Electronic health records, data warehouses, and data “lakes” are treasured resources in this modern era of model training. Various applications of precision medicine, “digital twins”, and other predictive mimicries depend on having the cleanest, most-accurate data feasible.

One of these data sets is “All of Us“, maintained by the National Institute of Health. Considering its wide use, the authors ask a very reasonable question: how accurate is the information contained within? Considering it is not possible to individually verify the vast scope of clinical information as applied to each person included in the data set, these authors choose what ought to be a fairly reliable surrogate: with what frequency do male and female persons included in the data set have sex-discordant diagnoses?

The authors term their measure the “incongruence rate”, and this reflects their findings of sex-specific diagnoses incongruent with the biological sex recorded. The authors iteratively refined their list and sample set, ultimately settling on 167 sex-specific conditions where there ought to be very little ambiguity – vastly those related to pregnancy and disorders of female genitalia.

Rather amazingly, their overall finding was an “incongruence rate” of 0.86% – meaning nearly 1 in 100 of these sex-specific diagnoses were found on a person of the incorrect biological sex. For example, out of 4,200 patients coded with a finding of testicular hypofunction, 44 (1.05%) were female. Or, out of 2,101 coded for a finding of prolapse of female genital organs, 21 (1%) were male. The authors also performed further analyses exploring whether cis- or trans- gender misidentification was affecting these findings, and actually note the incongruence rate rose to 0.96%.

Specifics regarding limitations or flaws in this approach aside, the key insight is that of widespread inaccuracies within electronic health data – and systematic approaches to diagnostic incongruence may be useful methods for data cleansing.

“Navigating electronic health record accuracy by examination of sex incongruent conditions”
https://pubmed.ncbi.nlm.nih.gov/39254529/