If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Grenoble Alpes University, Institute for Advanced Biosciences, Research Center UGA/Inserm U 1209/CNRS 5309, Gastroenterology, Hepatology and GI Oncology Department, Digidune, Grenoble Alpes University Hospital, La Tronche, France
At listing but not last tumor reassessment, Metroticket 2.0 showed better discriminative ability than AFP score for HCC recurrence.
Discriminative power using respective thresholds was similar between models, either at listing or last tumor reassessment.
Gaps and overlaps were observed when stratifying recurrence risk according to proposed thresholds.
Combining both models at listing and at last tumor reassessment in a “within-ALL decision algorithm” could optimize candidate selection.
Background & Aims
Two recently developed composite models, the alpha-fetoprotein (AFP) score and Metroticket 2.0, could be used to select patients with hepatocellular carcinoma (HCC) who are candidates for liver transplantation (LT). The aim of this study was to compare the predictive performance of both models and to evaluate the net risk reclassification of post-LT recurrence between them using each model’s original thresholds.
This multicenter cohort study included 2,444 adult patients who underwent LT for HCC in 47 centers from Europe and Latin America. A competing risk regression analysis estimating sub-distribution hazard ratios (SHRs) and 95% CIs for recurrence was used (Fine and Gray method). Harrell’s adapted c-statistics were estimated. The net reclassification index for recurrence was compared based on each model’s original thresholds.
During a median follow-up of 3.8 years, there were 310 recurrences and 496 competing events (20.3%). Both models predicted recurrence, HCC survival and survival better than Milan criteria (p <0.0001). At last tumor reassessment before LT, c-statistics did not significantly differ between the two composite models, either as original or threshold versions, for recurrence (0.72 vs. 0.68; p = 0.06), HCC survival, and overall survival after LT. We observed predictive gaps and overlaps between the model’s thresholds, and no significant gain on reclassification. Patients meeting both models (“within-ALL”) at last tumor reassessment presented the lowest 5-year cumulative incidence of HCC recurrence (7.7%; 95% CI 5.1-11.5) and higher 5-year post-LT survival (70.0%; 95% CI 64.9-74.6).
In this multicenter cohort, Metroticket 2.0 and the AFP score demonstrated a similar ability to predict HCC recurrence post-LT. The combination of these composite models might be a promising clinical approach.
Impact and implications
Composite models were recently proposed for the selection of liver transplant (LT) candidates among individuals with hepatocellular carcinoma (HCC). We found that both the AFP score and Metroticket 2.0 predicted post-LT HCC recurrence and survival better than Milan criteria; the Metroticket 2.0 did not result in better reclassification for transplant selection compared to the AFP score, with predictive gaps and overlaps between the two models; patients who met low-risk thresholds for both models had the lowest 5-year recurrence rate. We propose prospectively testing the combination of both models, to further optimize the LT selection process for candidates with HCC.
However, recently, composite predictive models, including size and number of tumor nodules, and serum alpha-fetoprotein (AFP) levels, have been shown to outperform Milan criteria for the prediction of post-LT outcomes
These new composite models include the French AFP score, which was adopted nationally by the French organ sharing organization in 2013 after external assessment in a prospective cohort, and the Metroticket 2.0 model, published in 2018.
With these composite models now available, a major issue is comparing their respective performances, and the gain on risk reclassification using specific thresholds or cut-offs, before considering a change in local or regional organ sharing organization rules and adopting them for selection of LT candidates.
It has been suggested that Metroticket 2.0 has superior discriminative power compared to the AFP score for the evaluation of post-LT survival in patients transplanted for HCC.
but not a specific clinical threshold for LT selection. As a consequence, the net gain on clinical reclassification of risks when using LT selection criteria (within or beyond approach) was not assessed.
The aim of the present study was therefore to further compare the AFP score and Metroticket 2.0 in a large multinational multicenter cohort, to reassess their respective performance for post-LT HCC recurrence, HCC survival and overall survival, and to evaluate the net reclassification between models using each model’s original thresholds.
Patients and methods
This was a multicenter, multinational cohort study of consecutive adult patients with HCC who underwent LT in 47 different centers from Europe and Latin America. For this purpose, four databases including patients transplanted with HCC from France, Italy and Belgium between 2000 and 2018 and from Latin America (Argentina, Uruguay, Chile, Brazil, Ecuador, Colombia and Mexico) between 2005 and 2018 were considered. These four regional databases were merged, harmonized, quality controlled and hosted on a central server, following the agreement of all participating centers. Data was reviewed center-by-center and region-by-region in a step-by-step process that took several months. This final database was named the “Western-Latin American HCC LT Consortium”, approved by the Austral University committee (CIE 17-065) and registered as part of an open public registry (NCT03775863; www.clinicaltrials.gov). All procedures were followed in accordance with STROBE and REMARK guidelines,
We excluded patients if: 1) extrahepatic or macrovascular tumor invasion was observed during pre-transplant evaluation, 2) incidental HCC was found at explant pathology, 3) there were tumors other than HCC found in the explant, and 4) they were included in the French training cohort in which the AFP score was developed.
Exposure variables at HCC diagnosis, at listing and during the waiting list period
Common exposure variables in all cohorts consisted of recipient characteristics, radiological tumor burden on pre-LT CT or MRI, including number and diameter of each HCC nodule, paired with AFP serum levels at HCC diagnosis, at time of listing and at last tumor reassessment during the waitlist period when available.
Patients were classified according to the Milan criteria, the AFP score and Metroticket 2.0, at listing, and then at last pre-transplant assessment when appropriate. The Milan criteria were the common standardized criteria used for patient selection in all centers but, according to local practices and allocation policies, transplantation for patients exceeding Milan was also considered and discussed at each transplant center on a case-by-case basis. The AFP score (0 to 9 points) was calculated depending on largest tumor diameter (≤3 cm = 0 points, 3-6 cm = 1 point, >6 cm = 4 points), number of HCC nodules (1-3 nodules = 0 points, ≥4 nodules = 2 points), and AFP levels ng/ml (≤100 = 0 points, 101-1000 = 2 points, and >1,000 = 3 points).
In addition, in patients receiving bridging therapies during the waiting list period, last available pre-transplant radiologic tumor assessment and AFP values following these procedures were also registered. Patients without any reassessment or those transplanted within 3 months of listing were not included in the final pre-transplantation reassessment analysis. Tumor treatment and type of bridging therapies before transplantation were decided at each transplant center.
Endpoints and statistical analysis
The primary endpoint analyzed was post-LT HCC recurrence because it was considered the most important event affecting HCC-specific survival after transplantation.
Recurrence was assessed based on imaging criteria plus serum AFP or by biopsy. Secondary endpoints were HCC survival and overall post-LT survival. All patients were followed until death or last outpatient visit.
The cumulative incidence of recurrence was estimated in a competing risk framework, with death without HCC as a competing event, and the association of exposures with the risk of recurrence was analyzed using a Fine-Gray model, estimating sub-distribution hazard ratios (SHRs) and 95% CIs.
The proportional hazards assumption for competing risk regression was evaluated using the Grambsch-Thernau test. Each model’s performance was compared including calibration and discrimination. Calibration was assessed comparing observed and predicted risk curves and discrimination with Harrell’s adapted c-statistics for competing risk analysis was estimated for both, original (as originally proposed) and each model’s threshold.
We used the somersd command to estimate the 95% CIs and compare the concordance statistics between models. First, we estimated the inverse hazard estimates, changing the coding for censored and uncensored lifetime observations. Afterwards, we compared the inverse sub-distribution hazards. The net reclassification index (NRI) was estimated to evaluate and quantify the agreement between “upward” and “downward” risk reclassifications and event status using each model's threshold.
In order to test independent variables associated with HCC survival, we again use the Fine and Gray method, considering non-HCC-related deaths as competing events.
For overall survival analysis after transplantation, Kaplan Meier survival curves were compared using the log-rank test (Mantel-Cox), and hazard ratios (HR) and 95% CIs were estimated using a Cox proportional regression model. The proportional hazards assumption was evaluated through graphical diagnostics and the Schoenfeld residual test. The performance of each model was evaluated in terms of calibration (observed vs. predicted curves) and discriminative power (Harrell’s c-index).
A planned sensitivity analysis according to pre-LT tumor reassessment, and LT periods of time was performed, estimating c-statistics for each model (LT periods 2000-2005, 2006-2011 and 2011-2018). Collected data were analyzed with StataBE v.17 and R software (Supplementary CTAT table).
A total of 2,444 patients who underwent LT in 47 LT centers were included, 55.6% from Europe (n = 1,359) and 44.4% from Latin America (n = 1,085). The cohort was also categorized based on the period of transplantation: with 24.3% of transplants performed between 2000 and 2005 (n = 594), 40.0% between 2006 and 2011 (n = 978), and 35.7% between 2012 and 2018 (n = 872). Baseline patient and tumor characteristics at time of listing and at last pre-LT reassessment are shown in Table 1.
At time of listing, 80.9% of the patients (n = 1,978) were within the Milan criteria, 88.7% of the patients had an AFP score ≤2 points at listing (n = 2,163) (Table 1). According to each of the three Metroticket 2.0 thresholds,
83.1% (n = 2,026) met the Up-to 7 plus AFP <200 ng/ml cut-off, 66.4% (n = 1,620) the Up-to 5 plus AFP <400 ng/ml cut-off and 44.5% (n = 1,087) the Up-to 4 plus AFP <1,000 ng/ml cut-off. Overall survival and recurrence rates at 1, 5 and 10 years of follow-up are shown in Fig. S1. Only five patients had missing AFP values at listing, in whom composite models could not be assessed. In 70.1% of the patients (n = 1,713), bridging therapies were received during the waiting list period, with a median time from last therapy to LT of 4.9 months (IQR 2.1-9.7). Overall, median follow-up in the entire cohort was 3.8 years (IQR 2.4-5.5 years).
Performance of the AFP score and metroticket 2.0 in evaluating post-transplant outcomes
At time of listing, patients within Milan but exceeding either the AFP score or the Metroticket 2.0 presented higher recurrence and lower post-transplant survival rates than patients within Milan but fulfilling the AFP score or the Metroticket 2.0 model. On the contrary, among patients exceeding Milan, both composite models identified a population with excellent outcomes after LT (Fig. S2A,B).
Similar results were observed when evaluating the effect and risk categorization for HCC recurrence in patients with tumor reassessment before transplantation (n = 1,377). No significant difference in the risk of recurrence was observed between patients exceeding Milan criteria compared to those within Milan criteria with AFP scores ≤2 (SHR 1.32; 95% CI 0.74-2.37; p = 0.34). Increasing recurrence risk was observed in patients either within (SHR 2.24; 95% CI 1.26-4.00; p = 0.006) or beyond (SHR 4.20; 95% CI 2.74-6.43; p <0.0001) the Milan criteria with AFP scores higher than 2 points. Similarly, patients exceeding Milan but within Metroticket 2.0 had a similar risk as patients within the Milan criteria (SHR 1.51; 95% CI 0.77-2.94; p = 0.22). On the other hand, patients within (SHR 4.43; 95% CI 2.90-6.77) or beyond (SHR 4.85; 95% CI 3.17-7.44) the Milan criteria but exceeding Metroticket 2.0 were at a higher risk of recurrence compared to those who met the Milan and Metroticket 2.0 criteria, respectively (p <0.0001). Time on the waiting list (as a continuous variable) and locoregional treatment were not independently associated with HCC recurrence.
Discrimination and net reclassification of risk of recurrence
A better discriminative power for risk of recurrence was observed for both composite models at time of listing when compared to the Milan criteria, either as original or threshold models (Table 2). There was not a significant change on each model’s performance when including the effect of waiting list time and locoregional therapy.
Table 2Discrimination power of HCC recurrence between Milan criteria, the AFP model and Metroticket v2.0 criteria at time of listing.
although Metroticket 2.0 showed a higher discriminative power for HCC recurrence compared to the AFP score, there were no significant differences when comparing each model’s thresholds (Table 2). Also, there were not significant differences for HCC survival and overall survival.
Stratified by LT periods, there was not a significant difference in c-statistics between both composite models (Table S3). Similarly, at last tumor reassessment, c-statistics for both originally proposed models, the AFP score (0.68; 95% CI 0.63-0.73) and Metroticket 2.0 (0.72; 95% CI 0.68-0.77), were significantly better than for the Milan criteria (0.56; 95% CI 0.52-0.60; p <0.0001) (Table 2). However, there was no significant difference when comparing the AFP score and Metroticket 2.0 as original models (p = 0.065).
Metroticket 2.0 did not lead to a significant reclassification of risk compared to the AFP score cut-off value at time of listing nor at last tumor reassessment (Table 3 and Table S4). Although Metroticket 2.0 gained on sensitivity, a decreasing specificity was observed according to each threshold.
Table 3Net reclassification index considering competing events for risk reclassification of recurrence (events) and non-recurrence (non-events).
Up events relative change in sensitivity (95% CI)
Up non-events relative change in specificity (95% CI)
AFP score vs.
Up-to 7 + AFP <200
Up-to 5 + AFP <400
Up-to 4 + AFP <1,000
All models assessed at listing. The NRI was estimated to evaluate the ability of these models to discriminate between events and non-events by quantifying the agreement between “upward” and “downward” risk reclassifications and event status.
We further conducted a stratified analysis according to the AFP score and each Metroticket 2.0 threshold at time of listing. Gaps and overlaps between the two models were observed for each threshold (Fig. S3). First, according to the first Metroticket 2.0 threshold (sum up-to 7 plus AFP <200 ng/ml), in 8.7% of the population (n = 211) this Metroticket 2.0 threshold did not categorize the risk of recurrence in patients with AFP scores ≤2 or >2 points. Patients meeting this Metroticket 2.0 threshold with AFP scores >2 points presented similar 5-year recurrence rates as patients exceeding Metroticket 2.0 with AFP scores ≤2. The models clearly separated two distinct populations, those patients meeting both models showed the lowest cumulative incidence of HCC recurrence (14.2%; 95% CI 12.1-16.5) and the highest 5-year post-LT survival rate (67.9%; 95% CI 65.3-70.4) (Fig. 1A-B). Furthermore, according to the other two Metroticket 2.0 thresholds, again, higher recurrence rates were observed in patients exceeding one or the other corresponding threshold of each model, and the lowest cumulative recurrence rates were observed in those patients meeting both of each model’s thresholds (Fig. 2A,B).
Similarly, at last pre-LT assessment, lower 5-year recurrence rates were observed in patients meeting both models criteria and, on the contrary, higher cumulative 5-year recurrence rates were observed in those patients exceeding both models for each Metroticket 2.0 threshold (Table 4).
Table 4Recurrence risk categorization in patients with tumor reassessment before transplantation.
Taking into account the aforementioned gaps and overlaps, we considered using both composite models for selection of LT candidates at time of listing in a “within-ALL” clinical-decision algorithm (Yes/No approach), but stratified according to the Milan criteria. Patients meeting both composite models’ thresholds either within or beyond the Milan criteria, showed the best post-LT outcomes with a lower risk of HCC recurrence (SHR of 0.28; 95% CI 0.22-0.36; p <0.0001), whereas patients exceeding both composite models, even meeting the Milan criteria, showed significantly higher risk of post-LT recurrence (Fig. 3A). In addition, survival rates were similar in patients meeting the “within-ALL” clinical-decision algorithm whether within or beyond the Milan criteria, and were significantly higher than in patients beyond the “within-ALL” clinical-decision algorithm (Fig. 3B). Also, at last tumor reassessment, patients meeting both composite models presented the lowest 5-year cumulative incidence of HCC recurrence (7.7%; 95% CI 5.1-11.5) (SHR 0.33; 95% CI 0.22-0.49; p <0.0001) (Fig. 4) and higher 5-year post-LT survival rates (70.0%; 95% CI 64.9-74.6).
In this multinational cohort study of transplanted patients with HCC, we compared the ability of two composite models, the AFP score and Metroticket 2.0, to predict post-LT outcomes. Three major results can be drawn from this study. First, we confirmed that both models performed better for the prediction of HCC recurrence and survival than the Milan criteria. This finding suggests that the selection of LT candidates with HCC could be optimized by using composite models, as recently proposed by the European Association for the Study of the Liver.
Second, composite models performed similarly, in terms of intrinsic predictive values. Third, although the two models performed similarly, we observed gaps and overlaps in risk stratification. Consequently, using a clinical decision-approach combining each model’s thresholds, we showed that patients meeting both models, especially at last reassessment, had the best post-LT outcomes.
Some technical issues should also be underlined when designing prognostic models.
Continuous data may lead to better discrimination compared to dummy or categorical variables, in which each stratum is “artificially” constructed and represents different risk classification. Third, Metroticket 2.0 was developed and validated in an HBV population, whereas the most frequent etiologies in this cohort were HCV and alcohol-related liver disease. Also, median AFP values, as well as median major nodule diameter and total number of tumor were similar between studies.
However, in the end, we did not observe significantly better reclassification, that is to say, no significant clinical impact, regarding risk of HCC recurrence between these two composite models. Although Metroticket 2.0 was superior in terms of sensitivity, it exhibited reduced specificity. Of note, both models showed discriminative power, with c-statistics of around 0.70. This underlines the need for further refinement of selection models.
showed similar discriminative power as these composite models but we could not directly compare them. While we wait for next-generation predictive tools based on molecular signatures, the available composite models offer a reasonable, reproducible, and user-friendly alternative as we move beyond the Milan criteria.
Using the alternative “within-ALL” decision process in this cohort, almost half of patients exceeding Milan could be included on the waiting list, while 6% of those within Milan could be excluded or granted better access to LT. Moreover, when considering changes over the waitlist period (including the effect of time and locoregional therapies), at last tumor evaluation prior to transplantation, meeting or exceeding this clinical decision point identified two populations at significantly different risks of recurrence. This shows how the clinical transplant decision should be reassessed over the waitlist period to further determine the risk of recurrence. The “within-ALL” strategy was proposed as clinical-decision-making algorithm, rather than a new predictive model. It aims to combine both composite model’s thresholds to select the best candidates for LT who may be granted with MELD exception points, increasing their chances of receiving a transplant. Based on this critical finding, rather than opposing these two models, we propose considering both composite criteria to select patients with the best phenotype of HCC selection criteria. However, this depends not only on the pool of deceased donors, but also on tumor behavior over the waitlist period. This proposal will have to be confirmed by prospective cohort studies.
This present study has limitations common to observational studies, particularly on retrospective cohorts. First, cohorts from different regions of the world were merged. However, data collection and outcome definitions were homogenous across cohorts, quality controlled, and there was not a differential assessment of the outcomes. Second, although we included cohorts in which the AFP score was externally validated, only one-third of the Latin American cohort was included in the prior external validation of the AFP score.
Other limitations include the relatively small number of patients with AFP scores >2 or exceeding the Metroticket 2.0 thresholds. We did not consider mRECIST or residual enhancement of treated tumors in order to avoid misinterpretation of necrotic or enhancing residual areas across centers. Unfortunately, based on the retrospective nature of this research, the independent effect of bridging therapies could not be completely analyzed. Finally, tumor reassessment was available in 50% of the study cohort, notably in patients transplanted before 2006, in part because of a short waiting time. We initially focused on data at time of listing because this time-point addresses the critical step in HCC candidate selection. It was therefore of major importance to reassess the discriminative power of the models in the subset of patients who have waited long enough to undergo tumor reassessment close to LT. In this last scenario, we again observed similar results.
In conclusion, in this multinational cohort study, both composite models shared similar predictive performances and showed better discrimination than the Milan criteria. However, gaps and overlaps were observed between each model when considering proposed thresholds, a limitation that can be counteracted by testing both models to guide the clinical decision-making process. Waiting for new biomarkers, we therefore suggest a clinical-decision algorithm, the “within-ALL” selection process; patients meeting both composite models achieve the best outcomes, broadening the scope for further optimization of candidate selection.
This research received no specific grant from any funding agency in the public, commercial, or non-profit sectors.
All the authors approved the final version of the manuscript. Concept and study design: Federico Piñero, Charlotte Costentin, Helena Degroote, Quirino Lai, Fernando Rubinstein and Christophe Duvoux. Data collection: France: Karim Boudjema, Philippe Bachellier, Filomena Conti, Olivier Scatton, Fabrice Muscari, Ephrem Salame, Pierre Henri Bernard, Claire Francoz, Francois Durand, Sébastien Dharancy, Marie-lorraine.Woehl, Claire Vanlemmens, Alexis Laurent, Sylvie Radenne, Jérôme Dumortier, Armand Abergel, Daniel Cherqui, Louise Barbier, Pauline Houssel-Debry, Georges Philippe Pageaux, Laurence Chiche, Victor Deledinghen, Jean Hardwigsen, J Gugenheim, M altieri, Marie Noelle Hilleret, Thomas Decaens, Christophe Duvoux. Latin America: Federico Piñero, Aline Chagas, Paulo Costa, Elaine Cristina de Ataide, Emilio Quiñones, Sergio Hoyos Duque, Sebastián Marciano, Margarita Anders, Adriana Varón, Alina Zerega, Jaime Poniachik, Alejandro Soza, Martín Padilla Machaca, Diego Arufe, Josemaría Menéndez, Rodrigo Zapata, Mario Vilatoba, Linda Muñoz, Ricardo Chong Menéndez, Martín Maraschio, Luis G Podestá, Lucas McCormack, Juan Mattera, Adrian Gadano, Ilka SF Fatima Boin, Jose Huygens Parente García, Flair Carrilho, Marcelo Silva. Italy: Andrea Notarpaolo, Giulia Magini, Lucia Miglioresi, Martina Gambato, Fabrizio Di Benedetto, Cecilia D’Ambrosio, Giuseppe Maria Ettorre, Alessandro Vitale, Patrizia Burra, Stefano Fagiuoli, Umberto Cillo, Michele Colledan, Domenico Pinelli, Paolo Magistri, Giovanni Vennarecci, Marco Colasanti, Valerio Giannelli, Adriano Pellicelli, Cizia Baccaro, Quirino Lai. Belgium: Helena Degroote, Hans Van Vlierberghe, Callebout Eduard, Iesari Samuele, Dekervel Jeroen, Schreiber Jonas, Pirenne Jacques, Verslype Chris, Ysebaert Dirk, Michielsen Peter, Lucidi Valerio, Moreno Christophe, Detry Olivier, Delwaide Jean, Troisi Roberto, Lerut Jan Paul. Statistical analysis and revision: Federico Piñero, Fernando Rubinstein, Quirino Lai. Writing the article: Federico Piñero, Charlotte Costentin, Helena Degroote, Quirino Lai and Christophe Duvoux. Critical revision: all authors.
Conflict of interest
The authors of this manuscript have no conflicts of interest to disclose as described by JHEP Reports.
Please refer to the accompanying ICMJE disclosure forms for further details.
France: Filomena Conti, Olivier Scatton, Pierre Henri Bernard, Claire Francoz, Francois Durand, Sébastien Dharancy, Marie-lorraine.Woehl, Alexis Laurent, Sylvie Radenne, Jérôme Dumortier, Armand Abergel, Louise Barbier, Pauline Houssel-Debry, Georges Philippe Pageaux, Laurence Chiche, Victor Deledinghen, Jean Hardwigsen, J Gugenheim, M altieri, Marie Noelle Hilleret, Thomas Decaens.
Latin America: Paulo Costa, Elaine Cristina de Ataide, Emilio Quiñones, Margarita Anders, Adriana Varón, Alina Zerega, Alejandro Soza, Martín Padilla Machaca, Diego Arufe, Josemaría Menéndez, Rodrigo Zapata, Mario Vilatoba, Linda Muñoz, Ricardo Chong Menéndez, Martín Maraschio, Luis G Podestá, Lucas McCormack, Juan Mattera, Adrian Gadano, Jose Huygens Parente García.
Italy: Giulia Magini, Lucia Miglioresi, Martina Gambato, Cecilia D’Ambrosio, Alessandro Vitale, Michele Colledan, Domenico Pinelli, Paolo Magistri, Giovanni Vennarecci, Marco Colasanti, Valerio Giannelli, Adriano Pellicelli.