Using deep learning to make better treatment decisions by eliminating biasing signals in lung cancer CT-scans
Clinical prediction models are generally developed on real-world observational data, as opposed to data from randomized clinical trials. The obvious reason for this is that the development of diagnostic and prognostic models requires many patients, especially when deep learning is used. Most often the intended use of these prediction models is for causal purposes to make the best possible treatment decision in an individual patient. At first glance, this should seem impossible. How can we expect prediction models to be accurate when we start making different clinical decisions in patients? To answer this question, we need to combine prediction with the language of causal inference as formalized in Pearl's structural causal models approach.
Deep learning for medical imaging
Deep learning has gained a lot of attention in recent years in the medical literature as a possible method to provide accurate clinical outcome predictions. The term deep learning describes a family of machine learning methods that are loosely based on the functioning of the brain, also called artificial neural networks (ANNs). In ANNs, raw input signals are fed into a network of ‘neurons’ that each perform relatively simple calculations, but when taken together can learn complex relationships between seemingly different datatypes by learning from examples. An example is the relationship between the chest CT-scan of a lung cancer patient and their probability of survival under different treatment possible regimens. Deep learning models are attractive for medical applications as they can be applied to raw data directly, e.g. to CT-scans, without having to pre-specify specific features, and can thus discover new patterns in the data that are predictive of, in our example, overall survival. A downside is that it is hard to see how an artificial neural network comes to a prediction, as the internally used ‘deep features’ are not directly human interpretable.
Combining deep learning and causal treatment decisions
Combining deep learning based prediction models and medical treatment decisions is even harder. The language of causal inference concerns both the variables like age, sex, treatment and survival and the causal relationships between these variables. For example age is a potential cause of the chosen treatment, but not vice versa. It is not trivial to see how the non-human interpretable ‘deep features’ can be used in conjunction with the structural causal model approach.
In our paper we do two things:
- We connect chest CT scans of patients with suspected or known lung cancer and deep features with the language of causal inference by viewing images as the result (or ‘effect’) of fundamental biological processes. The latter are naturally suited to the language of causality.
- We present a possible scenario where ignoring causal inference when developing a deep learning model that predicts overall survival for lung cancer patients leads to a model that is fundamentally incapable of advising on the best treatment, and subsequently show how causal inference can be combined with deep learning to arrive at the ‘optimal’ prediction model that is capable of treatment advice.
Eliminating biasing signals
In our study, we decided to investigate the case where we needed to ignore information in the medical images to arrive at a causally valid survival prediction and treatment recommendation model with deep learning for simulated lung cancer patients. To investigate this question, we set up a simulation experiment using real-world medical images (CT-scans of lung nodules) and simulated clinical data. On the technical side, our method exploits the perhaps surprising similarity between 'simple' linear regression and the last layer of a deep neural network. We employ a separate neural network training phase where we enforce that the information that we need to ignore is contained in a single last-layer activation. Subsequently we perform linear regression on the other last-layer activations, excluding the ‘forbidden’ information.
This way we can combine the modeling power of deep learning and insights from causal inference to provide unbiased predictions in our experiments with simulated survival data. Our work demonstrates that deep learning and causal inference are not mutually exclusive and shows a way how these can be combined to make better treatment decisions. In addition to our example of individual treatment recommendations, other situations where ignoring biasing information can be important include domain invariance (e.g. do models work the same in different hospitals / on different scanner types?) and fairness (can we ensure that a model does not predict differently for e.g. men and women in a way that would be deemed unfair?). The results of our experiments seem promising, however there are many questions left to answer. E.g. how well will this work for more complex data generating mechanisms? More work on the intersection of deep learning and causal inference is needed to pave the way for the eventual target of artificial intelligence in health care: making better decisions based on individualized predictions.