Can machine-learning algorithms accurately predict high-need, high-cost patients using the data from claims and health screening program?

Like Comment

Rapidly growing healthcare spending has become one of the most significant challenges in many developed countries. High-need, high-cost (HNHC) patients—often defined as the top 5% spenders of annual healthcare costs—account for about half of total healthcare costs. If these HNHC patients were to be constantly high-cost across multiple years, policymakers and insurers could easily identify this population and design tailored interventions targeted at this population to lower healthcare spending. However, about half of these HNHC patients turn out to be not high-cost in the prior year, suggesting that many people abruptly become HNHC patients due to unexpected illnesses and injuries, making it difficult to develop targeted interventions. The accurate prediction of individuals who will become HNHC patients in the near future is a critically important initial step for addressing healthcare spending growth.

Figure 1 Distribution of annual healthcare costs in the working-age population in Japan, 2016
In 2016, the top 1%, 5%, 10% of patients accounted for 26.4%, 47.7% and 60.0% of total annual healthcare costs.

Machine-learning-based prediction models have been shown to outperform conventional prediction models as they can account for the complex interplay among a large number of predictors. However, evidence is limited whether the machine-learning-based prediction model using clinical data from the health screening program combined with claims data could achieve high prognostic performance for predicting HNHC patients in subsequent years.

Our work has demonstrated that the prediction model using both clinical and claims data slightly improved the accuracy of the prediction compared with the model using only claims data (which is more easily-accessible than clinical data for many policymakers and insurers). In our study, we used a random sample of nationwide data on the working-age population who underwent a health screening program in Japan in 2013-2016, and developed five machine-learning-based prediction models (i.e., logistic regression, Lasso regression, random forest, gradient-boosted decision tree, and deep neural network) for HNHC patients in the subsequent year. Predictors included demographics, blood pressure, laboratory tests (e.g., HbA1c, LDL-C, and AST), survey responses (e.g., smoking status, medications, and past medical history) from the health screening programs.

Our prediction models exhibited good prognostic performance (AUC [area under the curve] of 0.84), and outperformed conventional prediction models relying only on claims data. These findings suggest that incorporating clinical data, which provide complementary information about the participants’ health status, is helpful for improving the performance of the prediction models that used only claims data.

Figure 2 Prediction ability of e prediction models for HNHC patient
A) ROC curves and B) Decision curve analysis showed the benefit of machine-learning-based prediction models compared to the reference model (i.e., conventional logistic regression).

The prediction models we developed should be useful for policymakers and payers by accurately predicting future HNHC patients in real-time and intervening if necessary to curb rapidly growing healthcare spending. Combining clinical data from the health screening programs and machine-learning techniques could have great potential to address many challenges in healthcare—including the increasing healthcare spending—more effectively.


Itsuki Osawa

MD, The University of Tokyo Hospital, Tokyo, Japan

No comments yet.