Machine Learning Model Accurately Predicts Long-Term HCC Risk in Patients with HBV

Published on: December 21, 2024

The ML model exhibited better predictive performance than previous traditional HCC models in patients with chronic HBV on long-term antiviral therapy.

A novel machine learning model may offer a more accurate tool for forecasting de novo hepatocellular carcinoma (HCC) risk in virologically and biochemically stable patients with chronic hepatitis B virus (HBV) after 5 years of potent antiviral therapy.1

The proposed model, coined “Machine learning Algorithm for Prediction of Liver cancer after 5 years of antiviral therapy” (MAPL-5), addresses the limited accuracy of conventional models for predicting HCC risk in this context by combining logistic regression and random forest based on 36 comprehensive clinical variables.1

According to the World Health Organization, in 2022, an estimated 254 million people were living with chronic hepatitis B infection and an estimated 1.1 million deaths occurred due to hepatitis B, mostly from cirrhosis and HCC. Treatment with potent nucleos(t)ide analogs (NAs), such as entecavir (ETV) or tenofovir (TFV), has significantly reduced the risk of complications and mortality from HBV, but concerns remain regarding long-term HCC risk in these patients.2

“The overall incidence of HCC is estimated to increase with the longer life expectancy of CHB patients who have achieved virological and biochemical stability,” Han Chu Lee, of the Asan Liver Center at the University of Ulsan College of Medicine in South Korea, and colleagues wrote.1 “Therefore, prediction models are needed for CHB patients who did not develop HCC during the first 5 years of treatment because the risk of hepatocarcinogenesis is relatively low in these patients; however, the absolute risk could gradually increase, considering the patients' prolonged lifespan.”

Investigators sought to develop and validate a machine learning model for predicting HCC in patients with chronic HBV after the first 5 years of ETV or TFV therapy. To do so, they conducted a multicenter retrospective cohort study leveraging data for patients from 2 hospitals who started ETV/TFV therapy between January 2009 and December 2015 and continued treatment for > 5 years.1

Those with a diagnosis of HCC or other malignant tumors within the first 5 years of ETV/TFV therapy; decompensated liver cirrhosis at the initiation of ETV/TFV therapy; death or liver transplantation within the first 5 years of ETV/TFV therapy; no regular surveillance for HCC development; and other definite etiologies of chronic liver disease were excluded from the study.1

Investigators collected data at the baseline, defined as the first prescription date, and the 5-year mark for ETV/TFV therapy. They used 36 variables, including baseline characteristics and laboratory values, for model development. A total of 5 machine learning algorithms were applied to the training dataset, internally validated using a test dataset, and further externally validated.1

A total of 6470 patients were included in the study, of whom 5908 and 562 were included in the derivation and external validation cohorts, respectively. Investigators noted patients in the derivation cohort were significantly older than those in the external validation cohort (mean age, 50 vs 46 years; P <.001). There were also greater proportions of male patients (66.1% vs 58.4%; P <.001) and patients with liver cirrhosis (60.0% vs 23.5%; P <.001).1

During a median follow-up duration of 8.6 (95% CI, 8.5–8.7) years, a total of 279 (4.7%) and 25 (4.5%) patients developed HCC in the derivation and external validation cohorts, respectively, in years 5–15.1

In the training dataset with 4726 patients, AdaBoost (0.729), logistic regression (0.735), and random forest (0.721) exhibited the highest balanced accuracy. Investigators then combined each model for ensemble learning, ultimately determining the ensemble model combining logistic regression and random forest outperformed the non-ensemble baseline model, with the highest balanced accuracy of 0.754 and AUC of 0.811. Therefore, this was selected as the final prediction model.1

Investigators conducted ablation studies in 3 categories to evaluate the contribution of each variable to the entire system in the ensemble model, including the presence of liver cirrhosis at baseline; absolute changes in laboratory values and Child-Pugh score; and relative changes in laboratory values and Child-Pugh scores. The final ensemble model incorporating all 3 categories of the 26 previously selected variables achieved the best performance over the 4 metrics of sensitivity, accuracy, balanced accuracy, and AUC.1

In an independent test dataset (n = 1182), the MAPL-5 model combining LR and random forest exhibited the best performance for predicting HCC development with a balanced accuracy of 0.712 and an AUC of 0.784. Investigators noted the similarity between the results of the cross-validation and the test dataset suggested minimal overfitting or underfitting of the model.1

Additionally, external validation performed in an independent cohort (n = 562) confirmed the good performance of the MAPL-5 model, with the highest accuracy metrics in terms of balanced accuracy (0.771) and AUC (0.862).1

Investigators acknowledged multiple limitations to these findings, including the potential lack of generalizability to patients of other ethnicities and with different HBV genotypes; the need for additional analyses with a larger group of patients to determine optimal cut-off values for high vs low risk; and the need for validation in a prospective cohort.1

“The MAPL-5 model can assist practitioners' clinical decision-making, educate patients, and formulate evidence-based policies regarding HCC surveillance for public health organizations,” investigators concluded.1

References

Ha Y, Lee S, Lim J, et al. A Machine Learning Model to Predict De Novo Hepatocellular Carcinoma Beyond Year 5 of Antiviral Therapy in Patients With Chronic Hepatitis B. Liver International. https://doi.org/10.1111/liv.16139
World Health Organization. Hepatitis B. Newsroom. April 9, 2024. Accessed December 20, 2024. https://www.who.int/news-room/fact-sheets/detail/hepatitis-b