Primary liver cancer classification from routine tumour biopsy using weakly supervised deep learning

Background & Aims The diagnosis of primary liver cancers (PLCs) can be challenging, especially on biopsies and for combined hepatocellular-cholangiocarcinoma (cHCC-CCA). We automatically classified PLCs on routine-stained biopsies using a weakly supervised learning method. Method We selected 166 PLC biopsies divided into training, internal and external validation sets: 90, 29 and 47 samples, respectively. Two liver pathologists reviewed each whole-slide hematein eosin saffron (HES)-stained image (WSI). After annotating the tumour/non-tumour areas, tiles of 256x256 pixels were extracted from the WSIs and used to train a ResNet18 neural network. The tumour/non-tumour annotations served as labels during training, and the network's last convolutional layer was used to extract new tumour tile features. Without knowledge of the precise labels of the malignancies, we then applied an unsupervised clustering algorithm. Results Pathological review classified the training and validation sets into hepatocellular carcinoma (HCC, 33/90, 11/29 and 26/47), intrahepatic cholangiocarcinoma (iCCA, 28/90, 9/29 and 15/47), and cHCC-CCA (29/90, 9/29 and 6/47). In the two-cluster model, Clusters 0 and 1 contained mainly HCC and iCCA histological features. The diagnostic agreement between the pathological diagnosis and the two-cluster model predictions (major contingent) in the internal and external validation sets was 100% (11/11) and 96% (25/26) for HCC and 78% (7/9) and 87% (13/15) for iCCA, respectively. For cHCC-CCA, we observed a highly variable proportion of tiles from each cluster (cluster 0: 5-97%; cluster 1: 2-94%). Conclusion Our method applied to PLC HES biopsy could identify specific morphological features of HCC and iCCA. Although no specific features of cHCC-CCA were recognized, assessing the proportion of HCC and iCCA tiles within a slide could facilitate the identification of cHCC-CCA. Impact and implications The diagnosis of primary liver cancers can be challenging, especially on biopsies and for combined hepatocellular-cholangiocarcinoma (cHCC-CCA). We automatically classified primary liver cancers on routine-stained biopsies using a weakly supervised learning method. Our model identified specific features of hepatocellular carcinoma and intrahepatic cholangiocarcinoma. Despite no specific features of cHCC-CCA being recognized, the identification of hepatocellular carcinoma and intrahepatic cholangiocarcinoma tiles within a slide could facilitate the diagnosis of primary liver cancers, and particularly cHCC-CCA.


Introduction
Primary liver cancers (PLCs) are the third leading cause of cancerrelated death worldwide, with an increasing incidence in Western countries. 1,2PLCs define a heterogeneous group of tumours associated with distinct risk factors, clinical findings, imaging, and histologic and molecular characteristics.Among these tumours, hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (iCCA) are the most common and represent the two ends of the PLC spectrum.7][8] The pathological definition of cHCC-CCA has significantly evolved.The last 2019 WHO classification endorsed a cHCC-CCA diagnosis definition based on the unequivocal presence of both hepatocytic and cholangiocytic differentiation within the same tumour on routine histopathology (hematein eosin or hematein eosin saffron [HES] staining).Additional immunostainings can help confirm the diagnosis. 9Nonetheless, diagnosing PLCs, particularly cHCC-CCA, is challenging on biopsy samples because histological analysis may miss one tiny tumour component.Our previous study concluded that although diagnostic performance remains relatively low for cHCC-CCA on routine biopsy staining, additional immunohistochemistry (IHC) and, mainly, a two-step strategy combining imaging and histology, significantly improved the diagnostic performance of biopsy for cHCC-CCA. 10istology slides are readily available in pathology departments, and the recent advance of high-throughput scanning devices has enabled digital captures of an entire conventional glass slide, called a whole-slide image (WSI). 11,12WSIs contain valuable information that can be invisible to the human eye.Artificial intelligence (AI), particularly deep-learning methods, can exploit this information at different levels by extracting deep morphological features from WSIs after a tiling step. 13,14The literature on AI applications has flourished in recent years.Most studies in computational pathology rely on supervised methods, where ground truth is known only for the slide (globally) and hardly ever for every single tile. 15The effort of an exhaustive annotation has been a significant obstacle to many supervised AI methods in medical applications.In histology, in which the image size is huge, no reasonable human effort will likely succeed in labelling a massive dataset (at the tile scale) for fully supervised AI methodologies. 16eakly supervised methods circumvent this difficulty by using a feasible amount of annotations, and several studies have already shown the effectiveness of these methods in extracting meaningful features for medical applications, including histology. 16,17The essence of the solved task remains unsupervised, although not entirely blind to the application specificities anymore.In imaging applications, this method could help provide a few categorical labels for WSI segmentation 16 or a limited number of classes to solve a more complex and broad classification problem. 180][21] Different AI models trained on histology slides of HCC treated by surgical resection showed that weakly supervised models outperformed others in predicting the activation of immune and inflammatory gene signatures. 14Although some of these studies have investigated weakly supervised methods on WSIs from surgical specimens, biopsy studies remain scarce.
This proof-of-concept study aimed to automatically classify PLCs from biopsy samples using a weakly supervised deeplearning approach.

Sample selection
We selected 119 formalin-fixed paraffin-embedded biopsies of PLC archived between 2012 and 2021 in the Pathology Department of Beaujon Hospital (Clichy, France), divided into 90 training and 29 internal validation samples.Both sets comprised a balanced proportion of HCC, iCCA and cHCC-CCA to train and validate the model correctly.We constructed an external validation set of 47 biopsies of PLC archived in the Pathology Department of Bicêtre Hospital between 2012 and 2023 (Le Kremlin-Bicêtre, France) (Fig. 1).All patients gave written consent for using the biopsies as required by French legislation.This study was registered at the Commission Nationale de l'Informatique et des Libertés and was approved by the ethics committee (EDS APHP no.CSE-20-85 MOSAIC-EDS).The recorded data included age, sex, presence of surgery, main risk factors for chronic liver diseases (e.g., viral hepatitis, chronic alcohol consumption, metabolic syndrome, hemochromatosis), Child-Pugh score, METAVIR fibrosis score in the non-tumour liver, and the number and size of tumours at diagnosis.

Pathology review
Two liver pathology experts (AB and VP) examined all HES WSIs, classifying each case as HCC, iCCA, or cHCC-CCA (Fig. 2).cHCC-CCA diagnosis was retained when biphenotypic differentiation (hepatocytic and cholangiocytic) was evident on HES WSIs according to the definition of the last WHO classification. 9antitative IHC analysis Quantitative IHC analysis was performed in the internal validation set to improve the assessment of the proportion of iCCA and HCC contingents, particularly for cHCC-CCA.Single-cell-based analyses were carried out on validation WSIs using the haematoxylin channel to segment cell nuclei in the open-source WSI analysis software QuPath (v0.2.3).22 A stepwise procedure followed segmentation for further cell subclassification.First, cells were classified as tumourous or stromal cells using QuPath's built-in machine-learning features as described.23 Tumour and stromal cells were further subclassified using intensity thresholds in the relevant chromogen channels (DAB).This subclassification comprised the identification of glypican 3-, antihepatocyte antibody, and CK7-positive cells within tumour areas (Fig. 3).A ratio of positive tumour cells to tumour cells was calculated for each immunostaining.The percentage of HCC IHC contingent was determined with the glypican 3 or antihepatocyte antibody ratio (the highest positive ratio between the two antibodies was retained).The percentage of iCCA IHC contingent was determined with the CK7 ratio.We used the Pearson correlation coefficient to assess the diagnostic agreement between the AI model and pathology.

Patch processing
Processing WSIs is computationally challenging because of their high dimension (10 4 pixels wide, which results in images of 100 to 1,000 megapixels).Because a straightforward resizing of the WSIs can lead to loss of crucial information at the microscopic scale, we adopted a patch-wise strategy, with patches (or tiles) of a fixed, much smaller size extracted from the images and processed individually.After testing different scales (i.e., patch sizes), we chose a scale of 125 lm, corresponding to a Research article patch size of 256x256.This choice allowed us to study the samples at an adequate level of tissue detail while guaranteeing a reasonable computational time.Because tissue areas are scarce within biopsies, this procedure also maximized the number of extracted patches compared to choosing a larger patch size.Large blank background areas cover the rest of the images; they were processed using a masking approach, ignoring the background and extracting patches lying on the tissue areas only (along the shape of the biopsy).We further processed border patches (i.e., patches containing background and tissue) by selecting the ones containing mainly tissue pixels.

Data augmentation
After patch extraction, we further augmented the data to bring more variability to the dataset.The data augmentation was restricted to plausible transformations to preserve meaningful patches from the pathological point of view: colour variations (±2% hue, [-80% to +60%] saturation, and ±20% brightness), rigid transformations such as horizontal and vertical flips, random rotations of (-90 to +90 ), and Gaussian blurring with a kernel of size 3x3.We varied the 125 lm scaling by a small factor between 1 and 1.2 to add variability while preserving similar structures in the patches.The final number of patches was 86,936, including 63,132 (70%) training patches and 23,804 (30%) validation patches.To avoid over-fitting, training and validation patches were extracted from biopsies of distinct patients.Both sets contained a balanced proportion (1/3) of patients with each type of cancer (Fig. 1).

Tumour/non-tumour annotations
The weakly supervised approach involved using tumour/nontumour annotations, which are more easily accessible than detailed tumour-type annotations.The experts manually outlined the tumoural regions, and their annotations were transformed into a binary mask of tumour/non-tumour regions inside/outside the outlined area.Border patches were considered tumoural if most pixels were labelled as such.All biopsies of training and validation cohorts contained tumoural and nontumoural tiles.We used both tumoural and non-tumoural tiles for training but only tumoural tiles for validation.

Feature extraction and clustering
The proposed deep learning-based approach involved a threestep process (Fig. 4).First, supervised learning trained a feature extractor guided by the tumour/non-tumour annotations.The

Implementation details
The first stage of the method used a pre-trained ResNet18 architecture that was fine-tuned on its last eight convolutional layers with the annotated 63,132 training patches.The weights of the first nine layers pre-trained on ImageNet 24 were unchanged.The ResNet18 architecture, although less complex than its deeper 50-and 150-layer versions, 25 provided a solid baseline for assessing the potential of transfer learning for our task.Recent works using models pre-trained on ImageNet have demonstrated that more complex architectures did not necessarily lead to clear improvements and significantly increased the computational load. 26Hence, ResNet18, with fewer parameters, is less prone to over-fitting and over-specialization.
We used an Adam optimizer with an initial learning rate of 0.03 and an exponential decay scheduler with a multiplicative factor of 0. After annotating the tumour/non-tumour areas, tiles of 256x256 pixels were extracted from the whole-slide images and used to train a ResNet18 neural network (step I).The tumour/non-tumour annotations served as labels during training, and the network's last convolutional layer was used to extract new tumour tile features (step II).Without knowledge of the precise labels of the malignancies, we then applied an unsupervised clustering algorithm (GMM) (step III).GMM, Gaussian mixture model.256x256x512 (i.e., 512 features representing each pixel).After principal component analysis, the patch size was reduced to 256x256x256.The unsupervised clustering algorithm was the Gaussian mixture model, which was solved with the expectation-maximization algorithm. 27After clustering, the WSI can be reconstructed to visualize the spatial distribution of the tumour labels (Fig. 5).

Clustering results
After feature extraction, the validation set was clustered with a Gaussian mixture model.Different numbers of clusters were tested and analysed.The most coherent results were obtained for K = 2 clusters.In this two-cluster model, Cluster 0 contained tiles with homogeneous tissue and large cells with abundant eosinophil cytoplasm, features characteristic of HCC (76% of the tiles).
The second cluster (Cluster 1) contained tiles with glandular structures within a fibrous stroma (stained orange in HES), usually present in iCCA tumours (92% of the tiles) (Fig. 6).Tiles from cHCC-CCA cases were spread among the two clusters.We created a three-cluster model (K = 3) to test the potential of extracting a separate cluster for the combined cHCC-CCA tumour.The three-cluster model divided the previous Cluster 0 into two clusters (Clusters 2 and 3), with the previous Cluster 1 remaining separate (Cluster 4).The cHCC-CCA patches were spread among the three clusters.Half the cHCC-CCA cases (51%) were assigned to the cluster with a majority of iCCA (Cluster 4, corresponding to Cluster 1 in the two-cluster model), whereas the two new HCC clusters, Clusters 2 and 3, contained 26% and 31% of cHCC-CCA cases, respectively (Fig. S1).

Quantification of clusters in the validation set
The two-cluster model was assessed with the validation set by quantifying the proportions of Cluster 0 (representative of HCC) and Cluster 1 (representative of iCCA) in each case.Overall, in validation set samples, the mean proportion of Cluster 0 was 51% (range 2-100%) and Cluster 1 was 48% (0-97%).

Two-cluster model and pathology diagnosis
For HCC and iCCA, the diagnostic agreement between the pathological diagnosis and the model predictions (major contingent) was 100% for HCC (11/11 cases) and 78% for iCCA (7/9 cases).For cHCC-CCA, we observed a highly variable proportion of each cluster type (Cluster 0 [5-50%] and Cluster 1 [9-94%]).The diagnostic agreement between the major contingent in conventional pathology and the one predicted by the model was 89% (8/9 cases) (Fig. S2).

Two-cluster model and quantitative IHC analyses
The diagnostic agreement between the IHC diagnosis and the two-cluster model predictions (major contingent) was 100% in HCC (5/5 cases), 86% in iCCA (6/7 cases) and 75% in cHCC-CCA (6/ 8 cases) (Fig. 7).The correlation coefficient between the proportions of Cluster 0 predicted by the two-cluster model and the HCC IHC contingent was 0.87, and between the proportions of Cluster 1 predicted by the two-cluster model and the iCCA IHC contingent was 0.77.

External validation of the two-cluster model
The results obtained with the two-cluster model for the internal validation set generalized well to the external validation set.Cluster 0 contained a majority of HCC tiles (74% of the tiles), and Cluster 1 contained mainly iCCA tiles (92% of the tiles) (Fig. 6).

Discussion
This proof-of-concept study has shown that weakly supervised deep learning can extract discriminative features of different PLCs from routinely stained tumour biopsies.A two-cluster model based on Gaussian mixtures of these features has illustrated the discriminative power of biopsies and their potential for classifying different liver cancers.
At the core of the proposed approach was a combination of supervised transfer learning and unsupervised clustering.For the supervised part, we used a pre-trained convolutional neural network (ResNet18) with weak annotations of tumour/nontumour areas in WSIs.As explained before, a complete annotation of these images would have required tremendous effort.Hence, a comparison with a fully supervised approach was impossible.Nonetheless, the adopted transfer learning approach successfully retrieved meaningful features with reasonable annotation effort and showed promising results for classifying PLC in biopsies.
Our model has identified two clusters, one specific to HCC and another specific to iCCA, corresponding to the two tumour types located at opposite ends of the malignant liver spectrum.The distinction between the two clusters aligns with the morphological variations between HCC, characterized by notable tumour eosinophilic hepatocytes without fibrous stroma, and iCCA, characterized by smaller cells with abundant fibrous stroma.Among HCC and iCCA cases in the validation set, two cases of iCCA were misclassified as HCC by the two-cluster model.A pathological review of these two cases revealed one moderately differentiated tumour and one poorly differentiated tumour, both exhibiting extensive architectural areas and minimal stroma.These factors could potentially explain the model's error.In the two-cluster model, cHCC-CCA tiles included varying proportions of both HCC and iCCA clusters.These results are consistent with the current WHO definition of the unequivocal presence of both hepatocytic and cholangiocytic differentiation within the same tumour on routine histopathology. 9he proposed two-cluster model provides the accurate percentage of HCC and iCCA components in cHCC-CCA, which could be valuable information for choosing a treatment strategy.Currently, there is no systemic treatment for unresectable cHCC-CCA, and patients with cHCC-CCA receive standard advanced iCCA or HCC treatments [28][29][30][31] without any specific recommendation.Hence, determining the main component of cHCC-CCA holds potential for guiding the best treatment strategy.
Our two-cluster model involved only routine HES slides.Although diagnosis is based on routine HES staining, additional IHC is still routinely performed to confirm the diagnosis of cHCC-CCA. 7,9Comparing each contingent's percentage with the pathological diagnosis (based on both HES and IHC staining) showed that our HES only model achieved equivalent tumour classification.This consistency is remarkable as biopsy samples may be limited in size, preventing the use of complementary IHC.For example, 9 out of 29 cases (31%) in our cohort could not benefit from this additional analysis because of a shortage in tumour tissue.Interestingly, the model has identified HCC tiles within all iCCA cases, confirmed by IHC analysis in half of them, suggesting that cHCC-CCA could be more frequent than reported by the pathologist in the iCCA group.Thus, our model, providing a more accurate tumour tissue characterization than conventional histology, confirms previous findings showing tiny HCC or iCCA areas (not diagnosed by histology) within cHCC-CCA samples using MALDI imaging, an in situ proteomic approach. 32he potential therapeutic outcomes and the ability to circumvent material difficulties highlight the importance of developing automatic AI methodologies for biopsy specimens.Currently, most AI studies of PLC have focused on surgical samples, 13,14,19,33 but most patients do not have such samples during their entire cancer history, which introduces a selection bias.In contrast, tumour biopsies are increasingly performed in the context of PLC, which could be helpful for AI-based approaches.With this in mind, deep learning can be challenging with biopsies, as their representativeness may be limited for heterogeneous tumours, and their small sample size and intricate shape reduce the number of tiles useable for analysis.Artefacts such as fragmentation, folds and tears (less visible in surgical samples) can also hinder the use of biopsies.5][36] The present study confirms that biopsies can be exploited despite these challenges and indicates that encouraging deep learning-based results can be obtained for liver cancer.Our study promotes the use of biopsy for PLC diagnosis and supports recent results showing that tumour biopsy led to accurate diagnosis in 11% of nodules classified as LIRADS-5 on radiology. 37he main limitations of this work are its retrospective design and the relatively low number of studied cases.To the best of our knowledge, no open-access database of PLC biopsies is available.Moreover, HCC routine biopsies are scarce, and cHCC-CCA samples are even more so because of the extreme rarity of this subtype.In order to avoid unbalanced learning, we have selected a similar proportion of each tumour subtype in the training and internal validation sets, which has impacted the total number of studied cases.Additional prospective validation has yet to confirm the full potential of our model in assisting the pathologist.
A weakly supervised learning method is able to extract specific morphological features of HCC and iCCA from tumour biopsy.Despite no specific features of cHCC-CCA being recognized, the identification of HCC and iCCA tiles within a slide could facilitate the diagnosis of cHCC-CCA.

Fig. 3 .Fig
Fig.3.Quantitative immunohistochemistry analysis with QuPath software.Example analysis of CK7 staining in an iCCA case.The tumour area was manually annotated.Cells were classified as tumour cells (red or blue) or stromal cells (green) by using the QuPath machine-learning features.Then CK7-positive tumour cells (red) were further subclassified by using intensity thresholds in the tumour area relevant chromogen channels (DAB).iCCA, intrahepatic cholangiocarcinoma.

Fig. 4 .
Fig.4.Main steps of the proposed weakly supervised method.After annotating the tumour/non-tumour areas, tiles of 256x256 pixels were extracted from the whole-slide images and used to train a ResNet18 neural network (step I).The tumour/non-tumour annotations served as labels during training, and the network's last convolutional layer was used to extract new tumour tile features (step II).Without knowledge of the precise labels of the malignancies, we then applied an unsupervised clustering algorithm (GMM) (step III).GMM, Gaussian mixture model.
Comparison of the two-cluster predictions and immunohistochemistry contingents in the internal validation cohort (n = 20).Pie charts showing (i) the proportion of Cluster 0 (green) and 1 (blue) tiles within each slide and (ii) the proportion of HCC (orange) and iCCA IHC contingents (brown) within each slide.The associated pathological diagnosis is displayed above each chart.cHCC-CCA, combined hepatocellular-cholangiocarcinoma; HCC, hepatocellular carcinoma; iCCA, intrahepatic cholangiocarcinoma.
Fig.8.Comparison of the two-cluster predictions and the pathological diagnosis within each slide of the external validation set (n = 47).Pie charts showing the proportion of Cluster 0 (green) and 1 (blue) tiles within each slide.The associated pathological diagnosis is displayed above each group.cHCC-CCA, combined hepatocellular-cholangiocarcinoma; HCC, hepatocellular carcinoma; iCCA, intrahepatic cholangiocarcinoma.