Explainable Machine Learning for Predicting Sepsis Outcome

Abstract:

The term sepsis is used for an inadequate host response to infection which, if not diagnosed and treated early, can result in life threatening organ dysfunction. No specific anti-sepsis treatment exists, instead its management relies on infection control techniques, which are more effective if the infection is detected early. Machine learning provides a range of approaches to analyse large patient datasets, potentially finding patterns and trends between features that may not be
clear to clinicians.

We completed an experimental analysis of a random forest, gradient boosted classifier, kneighbours classifier and neural network to test their performance classifying patient outcome. We hypothesised that the neural network would have the highest performance, as tree-based methods are prone to overfitting. In contrast to our hypothesis, we found that the tree-based methods performed the best, predicting patient mortality with an average precision of 0.79 and AUC ROC of 0.82. We partioned our dataset into two subsets D1 and D2, finding a significant perfomance increase when using D2, suggesting it contained the majority of important features. We analysed global feature importance, and identified features comparable with findings in literature, alongside some different features such as seen by complex care team, and chronic obstructive pulmonary disease. The novel INVASE method showed promising feature importances for the neural network model, however it converged such that there was no difference in feature importances per instance, which could have been a limitation of the small dataset size.

Download Fergus' Thesis