Using Patient-Similarity to Predict Pulmonary Embolism

Topological data analysis is one of the many “big data” buzzphrases being thrown about, with roots in non-parametric statistical analysis, and promoted by the Palo Alto startup, Ayasdi.  I’ve done a little experimentation with it, and used it mostly to show the underlying clustering and heterogeneity of the PECARN TBI data set.  My ultimate hypothesis, based on these findings, would be that patient-similarity is a more useful predictor of individual patient risk than the partition analysis used in the original PECARN model.  This technique is similar to the “attribute matching” demonstrated by Jeff Kline in Annals, but of much greater granularity and sophistication.

So, I should be excited to see this paper – using the TDA output to train a neural network classifier for suspected pulmonary embolism.  Using 152 patients, 101 of which were diagnosed with PE, the authors develop a topological network with clustered distributions of diseased and non-diseased individuals, and compare the output from this network to the Wells and Revised Geneva Scores.

The AUC for the neural network was 0.8911, for Wells was 0.74, and Revised Geneva was 0.55. And this sounds fabulous – until it’s noted the neural network is being derived and tested on the same, tiny sample.  There’s no validation set, and, given such a small sample, the likelihood of overfitting is substantial.  I expect performance will degrade substantially when applied to other data sets.

However, even simply as scientific curiosity – I hope to see further testing and refinement of potentially greater value.

“Using Topological Data Analysis for diagnosis pulmonary embolism”
http://arxiv.org/abs/1409.5020
http://www.ayasdi.com/_downloads/A_Data_Driven_Clinical_Predictive_Rule_CPR_for_Pulmonary_Embolism.pdf