External validation of the Toronto hepatocellular carcinoma risk index in a Swedish population

Background & Aims The Toronto hepatocellular carcinoma (HCC) risk index (THRI) is a predictive model to determine the risk of HCC in patients with cirrhosis. This study aimed to externally validate the THRI in a Swedish setting to investigate whether it could identify patients not requiring HCC surveillance. Methods From 2004-2017, 2,491 patients with cirrhosis at the Karolinska University Hospital were evaluated. Patients were classified into low-, intermediate- and high-risk groups for future HCC according to the THRI. Harrell’s C-index, calibration-in-the-large, calibration slope and goodness-of-fit estimates were calculated to assess model discrimination and calibration. Cox proportional hazards regression was used to determine the risk of HCC. Results Most patients were male (n = 1,638, 66%). The most common etiologies of cirrhosis were steatohepatitis (n = 1,182, 48%) followed by viral hepatitis (n = 987, 40%). In all, 131 patients (5.3%) were designated as low risk for HCC. Harrell’s C-index was 0.69. Calibration-in-the-large (0.11), calibration slope (1.24, not different from 1, p = 0.66) and goodness-of-fit showed good model calibration. Patients in the high-risk group had a 7.1-fold (95% CI 2.9–17.2) higher risk of HCC and patients in the intermediate-risk group had a 2.5-fold (95% CI 1.0–6.3) higher risk compared to the low-risk group. Conclusions In a Swedish setting, the THRI could differentiate between low- and high-risk of HCC development. However, because the low-risk group was relatively small (5.3%), the clinical applicability of the THRI could be limited. Lay summary The Toronto hepatocellular carcinoma (HCC) risk index (THRI) is a novel prediction model used to stratify patients with cirrhosis based on future risk of HCC. In this study, the THRI was validated in an external cohort using the TRIPOD guidance. Few patients were identified as low-risk, and the THRI had a modest discriminative ability, limiting its clinical applicability.


Cumulative HCC incidence by THRI risk category
Introduction Cirrhosis, the most advanced form of liver disease, is the primary risk factor for hepatocellular carcinoma (HCC) development. 1 Most cases of HCC are identified in patients with pre-existing cirrhosis. 2 Early HCC identification has been prognostically favourable, conferring lower mortality rates and better prognosis. 3,4 International guidelines recommend that all patients with cirrhosis with Child-Pugh class A-B and patients on the liver transplant waiting list be offered biannual HCC screening with ultrasound examination. 4,5 These recommendations are supported by observational cohort 3,4 and cost-effectiveness [5][6][7] studies.
The incidence of HCC varies greatly among different groups of patients. In addition, multiple risk factors, such as the etiology of liver disease, age, sex, type 2 diabetes (T2DM) and smoking, have been widely described. [8][9][10][11][12] However, the current HCC screening practice does not acknowledge the individual risk of HCC development in patients with cirrhosis, 4,5 possibly exposing lowrisk patients to unnecessary screening and risk of overdiagnosis. 13 Several studies have attempted to use established risk factors to create a risk stratification system that could identify patients at varying levels of risk for HCC development. 14,15 Such riskscoring systems could help decide when to initiate HCC surveillance, thereby saving scarce healthcare resources. 13 The Toronto HCC risk index (THRI) is an example of an HCC risk score suggested to identify low-risk patients not requiring HCC screening. 14 The THRI is based on age, etiology of cirrhosis, sex and platelet count. 14 These parameters are assigned points based on the hazard ratios (HRs) for the individual parameters as calculated in the original publication 14 and stratify patients as having low, intermediate or high risk of HCC development.
External validation of any prediction model is vital before incorporation into clinical practice. Herein, we used the THRI in a Swedish cohort to investigate whether it could identify patients with a low incidence of future HCC that do not benefit from HCC surveillance.

Patients
This was a cohort study based on historical data from patients with cirrhosis at the Karolinska University Hospital, a tertiarylevel hospital in Stockholm, Sweden, between 2004 and 2017. The study methodology was made similar to the original THRI study 14 to enhance comparability.

Diagnosis of cirrhosis and HCC
Patients were identified through the local electronic healthcare register, defining cirrhosis as an ICD-10 code (B180G, B181G, B182G, K70.3 or K74.6). The diagnosis of cirrhosis was then confirmed through a medical chart review. It was considered valid if any of the following criteria were met: signs of cirrhosis on pathology (i.e. biopsy with the presence of cirrhosis), radiology (surface nodularity, portal hypertension or fibroscan >14.5 kPa) or a clinical diagnosis (varices or ascites without any other explanation than cirrhosis) confirmed by a specialist in infectious diseases, hepatology or internal medicine. We excluded patients with a previous diagnosis of HCC or a diagnosis of HCC within 6 months from baseline. We also excluded patients for whom there was insufficient data to calculate the THRI. A flowchart for inclusion and exclusion is provided in Fig. 1. Patients meeting the criteria were included. Baseline was considered the date of the first recorded visit where the diagnosis of cirrhosis was established.
Patients were then divided into 4 groups based on the primary etiology of cirrhosis as in the original THRI paper 14 : (1) HCV with a previous sustained virological response (SVR) at baseline, or HCV without a previous SVR at baseline, or HBV.(2) Steatohepatitis (either non-alcoholic fatty liver disease [NAFLD] defined as a lack of clinically significant alcohol usage and either a BMI >30 kg/m 2 or a BMI >25 kg/m 2 and a co-existing diagnosis of T2DM; or alcohol-related liver disease [ALD] based on significant alcohol use identified by chart review or laboratory confirmation [phosphatidyl ethanol >0.3 lmol/L] and confirmed by a specialist in infectious diseases, gastroenterology or internal medicine).(3) Autoimmune liver disorders (autoimmune hepatitis, primary sclerosing cholangitis or primary biliary cholangitis).(4) "Other" (Wilson's disease, hemochromatosis, porphyria, alpha-antitrypsin deficiency, other rare genetic disorders and cryptogenic cirrhosis). If a patient had more than 1 etiology of cirrhosis, the etiological group was set according to the highest known risk of HCC development according to the THRI system. 14 For instance, patients with both steatohepatitis (ALD or NAFLD) and viral hepatitis were assigned to the viral hepatitis group. The aim of this was to minimise inter-etiological confounding and establish consistency with the original THRI publication. 14 During follow-up, the development of HCC was first established by an ICD-10 code of C22.0 present in the medical charts and verified on chart review. The method used to diagnose HCC was consistent with current guidelines 4,5 and was formally made at a multidisciplinary tumour conference. The same electronic healthcare system is used in the Stockholm region, with the exception of 1 hospital (Saint Göran's hospital). The Karolinska University Hospital is responsible for treating patients with HCC Research article in the Stockholm area. Thus, we probably captured most incident HCCs in the cohort during follow-up.

Outcomes and follow-up period
The end of follow-up was defined as either the date of HCC diagnosis, death, liver transplantation, loss to follow-up due to migration from the Stockholm region or after 10 years of followup, whichever occurred first. Patients who underwent treatment for HCV were censored at the time of SVR and then re-entered as a "new" patient based on the updated THRI score with new values for age, etiology and platelet count, if available at that date. Only complete-case analysis was considered and patients with missing data for any of the THRI values were censored at the date of SVR. Age, sex, etiology of cirrhosis and platelet count were recorded at baseline to compute a THRI score. International normalized ratio, creatinine, bilirubin, sodium and dialysis-dependent kidney failure were additionally recorded to calculate a baseline model for end-stage liver disease (MELD) score. BMI measurements were deemed valid ±4 weeks from baseline.

Statistical analysis
Analyses and external validation steps were performed on patients with complete data on the prognostic factors and outcome variables as defined by the development model. Predictors and outcome variables were defined in the same way or as close as possible to the development dataset. 14 Moreover, the THRI risk score in the validation dataset was calculated using the values assigned to each predictor variable as defined in the development model. Briefly, for each patient in the cohort, a risk score was computed using the values in the development dataset. Based on these risk scores, patients were classified as having a low risk (<120), an intermediate risk (120-240) or a high risk (>240) of HCC.
HRs were estimated across HCC risk and etiologic groups using a Cox regression proportional hazards model. The low-risk and the autoimmune groups were set as a reference in the respective analyses. Both univariable and multivariable analyses, adjusted for the MELD score and T2DM at baseline, were performed.
The proportional hazards assumption was checked using scaled Schoenfeld residual plots and corresponding test statistics. 16 A detailed description of the statistical method to externally validate the THRI is presented in the supplementary information. p <0.05 (2-sided) was considered statistically significant. Statistical analyses were performed in R Statistical software version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria) using the packages survival, rms and mice.

Measures of discrimination and calibration
Two independent discrimination measures were estimated to evaluate model discrimination: the Harrell C-index 17 and Royston and Sauerbrei's R 2 D . 18 Calibration was assessed by comparing the observed and predicted number of HCC events following Crowson's method, 19 which can be applied to the Cox proportional hazards model. Calibration-in-the-large, calibration slope and goodness-of-fit estimates were also calculated.
Kaplan-Meier curves between risk groups The incidence of HCC was determined for the THRI risk groups and etiological categories using the Kaplan-Meier method, and differences between groups were compared with the log-rank test.
Visual comparisons between these curves and those in the development dataset were also performed to provide a qualitative assessment of the calibration. When curves from the development and validation datasets were superimposable, qualitative calibration assessment was deemed successful.

Annual and cumulative HCC incidences
To validate the discriminative capacity of the THRI 5-and 10-year cumulative and annual incidences of HCC were calculated for the THRI and etiological groups.

Ethical considerations
This study was approved by the Regional Ethics Committee in Stockholm (dnr 2016/177231/2 and 2018/450-32). The Committee determined that informed consent was not required for this cohort study.

Participants
In all, 3,224 patients were identified using ICD-10 codes for cirrhosis. After exclusion, the final sample included 2,491 patients (Fig. 1). Totally, 371 patients achieved SVR during followup. However, because of missing data on > − 1 of the THRI parameters (most commonly platelets that were not sampled at the department of infectious diseases per protocol at SVR), we were only able to recalculate the updated THRI value for 7 of these patients. Of these 7 patients, 2 developed HCC during the followup period. In this study 131 patients (5.3%) were defined as low, 1,109 (44.5%) as intermediate and 1,251 (50.2%) as high risk.

Cohort characteristics
Baseline characteristics of the cohort are presented in Table 1. The most common cause of cirrhosis was steatohepatitis (n = 1,182, 47.5%), followed by viral hepatitis (n = 987, 39.6%) and "other" causes of cirrhosis (n = 177, 7.1%). Three hundred and four (12.2%) patients developed HCC during the first 10 years after baseline. Patients with viral hepatitis accounted for 55.6% of all HCC cases (n = 169) and those with steatohepatitis accounted for 37.5% of HCC cases (n = 114). The least common cause of HCC was autoimmune liver disorders (2%, n = 6).

External validation
Measures of discrimination A Harrell's C-index of 0.69 of our model was obtained and an identical value of 0.69 was yielded using Royston and Sauerbrei's R2D measure. Both measures indicate modest discrimination of the model.

Measures of calibration
A calibration-in-the-large value of 0.11 was obtained (which is not different from 0), indicating no evidence of global miscalibration of the model. The model's calibration slope was 1.24, a value not different from 1 (p = 0.66). The goodness-of-fit test indicated no overall evidence of the prognostic index's lack of fit in the validation data. Collectively, these results indicate good model calibration in the external validation data.

Assessment of model misspecification
No violation of the proportional hazards assumption was found using scaled Schoenfeld residuals (chi-squared 4.92, p = 0.43) on the validation dataset.

Kaplan-Meier curves between risk groups
An assessment of the Kaplan-Meier curves (Fig. 2) for the HCC risk groups provides informal visual evidence of discrimination as the curves were well separated, further supported by a formal comparison using the log-rank test (p <0.01). The risk of HCC was highest in the high-risk category and lowest in the low-risk category.
The Kaplan-Meier curves for the etiological groups (Fig. 3) illustrate the proportion of patients in each etiological group (autoimmune, steatohepatitis, viral, "other") with HCC during follow-up. Patients with viral hepatitis had a higher rate of HCC during follow-up than the other etiological groups. The "other" and steatohepatitis curves intersected at multiple points and were not well differentiated from each other.

Hazard ratios between risk groups
HRs for the development of HCC according to the THRI and the etiological risk groups are presented in Table 2.
For the THRI groups, the rate of HCC increased in the intermediate-and high-risk groups, with the rate rising in a doseresponse relationship. The risk group results remained statistically significant after adjusting for T2DM and MELD in the multivariable analysis (intermediate-risk group: adjusted HR [aHR] 2.5, 95% CI 1.0-6.3; high-risk group: aHR 7.1, 95% CI 2.9-17.2, Table 2).
Compared to patients with cirrhosis due to autoimmune liver disease, the risk of HCC development was highest in the viral group (aHR 5.2, 95% CI 2.3-11.8). A significantly higher risk was  Fig. 2. Kaplan-Meier curves for HCC development according to THRI groups. Cumulative incidence is illustrated and compared for the 3 THRI groups. The x-axis describes years of follow-up and the y-axis represents the proportion of patients with HCC. Curves were compared using the log-rank test (p <0.0001). HCC, hepatocellular carcinoma; THRI, Toronto HCC risk index.

Annual and cumulative HCC incidence
Patients in the viral group had the highest annual and cumulative HCC incidences at 5 and 10 years. The second-highest cumulative and annual HCC incidences were seen in patients with steatohepatitis. Patients in the "other" group had a similar HCC incidence to patients with steatohepatitis. The lowest HCC incidence was observed in patients with cirrhosis due to autoimmune liver diseases (Table 3).

Discussion
This study tried to externally validate the THRI to determine whether it could be used to identify patients who benefit from HCC screening. We identified 2,491 patients with cirrhosis, making this the largest validation study of the THRI. Compared to the lower-risk group, patients in the high-risk group had a 7.1fold higher risk of HCC. The intermediate-risk group had a 2.5fold higher risk of HCC than the lower-risk group. The cumulative incidence of HCC in the high-risk group at 10 years of followup was 42%, suggesting that this could be a subgroup of patients for which efforts should be made to ensure optimal HCC surveillance.
Thus, we confirm that the risk of HCC varies across groups of patients with cirrhosis, calling for individualised decisions on screening for HCC. However, the low-risk group was comparably small (only 5.3%), limiting the adaption of the THRI in clinical practice, also supported by the modest discrimination of the model (C-index = 0.69).
According to the TRIPOD criteria, the THRI in the current cohort was assessed using model discrimination and calibration. 20 Overall, the performance of the THRI in this Swedish cohort was mediocre, largely due to its limited discriminative ability, implicating that it could not adequately differentiate between the patients with and without HCC development during follow-up. However, the model was well-calibrated, i.e. the predicted risk conformed to the observed risk of HCC development in the current cohort and did not under-or overpredict HCC development.
The THRI performed better in both the external and internal validation cohorts in the original publication (C-index in internal validation: 0.75, external validation: 0.77) compared to the current cohort (C-index = 0.69). 14 This difference is likely due to the overall higher HCC incidences reported in our study. 14 The THRI was also validated by Zhang et al. in a cohort of 520 Chinese patients in 2019. 21 This study, comparable to ours, reported a higher incidence of HCC than Sharma et al. 14 Zhang et al. suggested that this finding could result from the higher prevalence of HBV, 21 as patients with HBV are generally at the highest risk of HCC development. [21][22][23] Similar to our findings, Zhang et al. reported that only a small proportion (4.4%) of the patients were considered low-risk according to the THRI.
The mean age of our cohort was higher at baseline than in Sharma et al.'s cohort (53.9 years vs. 58.9 years), which could explain the higher HCC incidence seen in this study as age is a major risk factor for HCC development. 24 The Karolinska University Hospital is a referral centre for patients with severe cirrhosis, including evaluations for liver transplantation. This fact could partially explain the divergence in HCC incidence, especially as the severity of cirrhosis is a known risk factor for HCC development. 25 N° at risk   14 Introducing a risk index (such as the THRI) into the regular follow-up of patients with cirrhosis could further individualise patient management. Even if the number of patients in the low risk group was low, such individuals can be reassured of a low risk and be excluded from surveillance, although repeated evaluations of the THRI would need to be made. Further, patients in the high-risk group are clearly at an elevated risk of HCC, so efforts should be made to make sure surveillance is optimal in these patients. They could also form the basis for future studies on the frequency of surveillance, such as a randomized controlled trial of screening every 3 months compared to current practice.
While the present results are insufficient to suggest a change in HCC surveillance in the current population, we recognise that future studies on HCC risk scoring and studies on the costeffectiveness of HCC screening are warranted to minimise unnecessary diagnostics with uncertain cost-effectiveness and potential overdiagnosis.
The medical community has recognised the benefit of an HCC risk index. Indeed, there have been several attempts at creating an HCC risk index for patients with cirrhosis or at otherwise increased risk of HCC development. 15,26,27 Because the THRI is a relatively simple risk index that would be easy to implement in clinical practice, we chose to validate this index. Nevertheless, we recognise that future validation of other HCC risk indexes is warranted.
The current validation found that the modest performance of THRI could not identify a larger low-risk group. However, we did find that both the low-risk and the autoimmune group presented with an annual HCC incidence lower than the previously suggested 1.5% threshold. This threshold was drafted by a few studies on the cost-effectiveness of HCC screening conducted on all patients with cirrhosis in which different etiologies of cirrhosis or other known risk factors were not considered. While this threshold was not intended for studying specific subgroups of patients, it indicates that some of these groups might not benefit from HCC screening. This is further supported by some recent studies on HCC epidemiology showing that the risk of HCC development is relatively low in some unselected populations. Recently, Jepsen et al. and Hagström et al. independently reported low incidences of HCC in patients with ALD-related cirrhosis, suggesting that regular HCC screening in these patients might not be costeffective. 28,29 The main limitation of this study is that Karolinska University Hospital accepts referrals and coordinates the treatment of patients with severe and hard-to-treat cirrhosis. The severity of cirrhosis is commonly accepted as a risk factor for HCC development. Thus, selection bias may have led to a higher HCC incidence than that reported by Sharma et al. 14 and a lower proportion of low-risk patients. However, this is a real-life cohort from a large tertiary centre. Thus, we should be able to generalise our results to similar settings. Due to missing data, we could not recalculate the THRI at the time of SVR in most patients with HCV cirrhosis. Finally, there were relatively few patients with autoimmune liver diseases, and future studies could strive to investigate the THRI in larger populations.
The THRI is a novel risk stratification system that questions the effectiveness of HCC surveillance in all populations. In this study we tried to validate THRI in a Swedish setting and found that the system could differentiate between low-and high-risk patients. However, it could only identify a small group of lowrisk patients (5.3%), suggesting that it only has modest clinical applicability.