Rapidly growing healthcare spending has become one of the most significant challenges in many developed countries. High-need, high-cost (HNHC) patients—often defined as the top 5% spenders of annual healthcare costs—account for about half of total healthcare costs. If these HNHC patients were to be constantly high-cost across multiple years, policymakers and insurers could easily identify this population and design tailored interventions targeted at this population to lower healthcare spending. However, about half of these HNHC patients turn out to be not high-cost in the prior year, suggesting that many people abruptly become HNHC patients due to unexpected illnesses and injuries, making it difficult to develop targeted interventions. The accurate prediction of individuals who will become HNHC patients in the near future is a critically important initial step for addressing healthcare spending growth.
Machine-learning-based prediction models have been shown to outperform conventional prediction models as they can account for the complex interplay among a large number of predictors. However, evidence is limited whether the machine-learning-based prediction model using clinical data from the health screening program combined with claims data could achieve high prognostic performance for predicting HNHC patients in subsequent years.
Our work has demonstrated that the prediction model using both clinical and claims data slightly improved the accuracy of the prediction compared with the model using only claims data (which is more easily-accessible than clinical data for many policymakers and insurers). In our study, we used a random sample of nationwide data on the working-age population who underwent a health screening program in Japan in 2013-2016, and developed five machine-learning-based prediction models (i.e., logistic regression, Lasso regression, random forest, gradient-boosted decision tree, and deep neural network) for HNHC patients in the subsequent year. Predictors included demographics, blood pressure, laboratory tests (e.g., HbA1c, LDL-C, and AST), survey responses (e.g., smoking status, medications, and past medical history) from the health screening programs.
Our prediction models exhibited good prognostic performance (AUC [area under the curve] of 0.84), and outperformed conventional prediction models relying only on claims data. These findings suggest that incorporating clinical data, which provide complementary information about the participants’ health status, is helpful for improving the performance of the prediction models that used only claims data.
The prediction models we developed should be useful for policymakers and payers by accurately predicting future HNHC patients in real-time and intervening if necessary to curb rapidly growing healthcare spending. Combining clinical data from the health screening programs and machine-learning techniques could have great potential to address many challenges in healthcare—including the increasing healthcare spending—more effectively.