Can machine-learning algorithms accurately predict high-need, high-cost patients using the data from claims and health screening program?

Nov 14, 2020

Itsuki Osawa, Goto Tadahiro, Yamamoto & Yusuke Tsugawa

4 contributors

Liked by Itsuki Osawa

Rapidly growing healthcare spending has become one of the most significant challenges in many developed countries. High-need, high-cost (HNHC) patients—often defined as the top 5% spenders of annual healthcare costs—account for about half of total healthcare costs. If these HNHC patients were to be constantly high-cost across multiple years, policymakers and insurers could easily identify this population and design tailored interventions targeted at this population to lower healthcare spending. However, about half of these HNHC patients turn out to be not high-cost in the prior year, suggesting that many people abruptly become HNHC patients due to unexpected illnesses and injuries, making it difficult to develop targeted interventions. The accurate prediction of individuals who will become HNHC patients in the near future is a critically important initial step for addressing healthcare spending growth.

**Figure 1 Distribution of annual healthcare costs in the working-age population in Japan, 2016**
In 2016, the top 1%, 5%, 10% of patients accounted for 26.4%, 47.7% and 60.0% of total annual healthcare costs.

Machine-learning-based prediction models have been shown to outperform conventional prediction models as they can account for the complex interplay among a large number of predictors. However, evidence is limited whether the machine-learning-based prediction model using clinical data from the health screening program combined with claims data could achieve high prognostic performance for predicting HNHC patients in subsequent years.

Our work has demonstrated that the prediction model using both clinical and claims data slightly improved the accuracy of the prediction compared with the model using only claims data (which is more easily-accessible than clinical data for many policymakers and insurers). In our study, we used a random sample of nationwide data on the working-age population who underwent a health screening program in Japan in 2013-2016, and developed five machine-learning-based prediction models (i.e., logistic regression, Lasso regression, random forest, gradient-boosted decision tree, and deep neural network) for HNHC patients in the subsequent year. Predictors included demographics, blood pressure, laboratory tests (e.g., HbA1c, LDL-C, and AST), survey responses (e.g., smoking status, medications, and past medical history) from the health screening programs.

Our prediction models exhibited good prognostic performance (AUC [area under the curve] of 0.84), and outperformed conventional prediction models relying only on claims data. These findings suggest that incorporating clinical data, which provide complementary information about the participants’ health status, is helpful for improving the performance of the prediction models that used only claims data.

**Figure 2 Prediction ability of e prediction models for HNHC patient**
A) ROC curves and B) Decision curve analysis showed the benefit of machine-learning-based prediction models compared to the reference model (i.e., conventional logistic regression).

The prediction models we developed should be useful for policymakers and payers by accurately predicting future HNHC patients in real-time and intervening if necessary to curb rapidly growing healthcare spending. Combining clinical data from the health screening programs and machine-learning techniques could have great potential to address many challenges in healthcare—including the increasing healthcare spending—more effectively.

Multiple Contributors

Itsuki Osawa, Goto Tadahiro, Yusuke Tsugawa & Yamamoto

View all

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Health Care

Life Sciences > Health Sciences > Health Care

npj Digital Medicine

npj Digital Medicine

An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

More about the journal

Related Collections

With collections, you can get published faster and increase your visibility.

Harnessing digital health technologies to tackle climate change and promote human health

This collection invites research on the use of digital health technologies that innovate solutions to improve sustainable health care practice and delivery.

Publishing Model: Open Access

Deadline: Apr 30, 2024

Explore this collection

Unlocking the potential of microbial synergy: Multi-OMICs insights into the co-culture of cyanobacteria and Pseudomonas putida

Behind the Paper

Rotten to the core: a neurofunctional signature of subjective disgust generalizes to oral distaste and socio-moral contexts

Behind the Paper

Research Communities by Springer Nature

Can machine-learning algorithms accurately predict high-need, high-cost patients using the data from claims and health screening program?

Itsuki Osawa

Goto Tadahiro

Yamamoto