Long-term major adverse liver outcomes in 1,260 patients with non-cirrhotic NAFLD

Background & Aims Long-term studies of the prognosis of NAFLD are scarce. Here, we investigated the risk of major adverse liver outcomes (MALO) in a large cohort of patients with NAFLD. Methods We conducted a cohort study with data from Swedish university hospitals. Patients (n = 1,260) with NAFLD without cirrhosis were diagnosed through biopsy or radiology, and had fibrosis estimated through vibration-controlled transient elastography, biopsy, or FIB-4 score between 1974 and 2020 and followed up through 2020. Each patient was matched on age, sex, and municipality with up to 10 reference individuals from the general population (n = 12,529). MALO were ascertained from Swedish national registers. The rate of events was estimated by Cox regression. Results MALO occurred in 111 (8.8%, incidence rate = 5.9/1,000 person-years) patients with NAFLD and 197 (1.6%, incidence rate = 1.0/1,000 person-years) reference individuals during a median follow up of 13 years. The rate of MALO was higher in patients with NAFLD (hazard ratio = 6.6; 95% CI = 5.2–8.5). The risk of MALO was highly associated with the stage of fibrosis at diagnosis. In the biopsy subcohort (72% of total sample), there was no difference in risk between patients with and without non-alcoholic steatohepatitis. The 20-year cumulative incidences of MALO were 2% for the reference population, 3% for patients with F0, and 35% for F3. Prognostic information from biopsy was comparable to FIB-4 (C-indices around 0.73 vs. 0.72 at 10 years). Conclusions This study provides updated information on the natural history of NAFLD, showing a high rate of progression to cirrhosis in F3 and a similar prognostic capacity of non-invasive tests to liver biopsy. Impact and implications Several implications for clinical care and future research may be noted based on these results. First, the risk estimates for cirrhosis development are important when communicating risk to patients and deciding on clinical monitoring and treatment. Estimates can also be used in updated health-economic evaluations, and for regulatory agencies. Second, our results again highlight the low predictive information obtained from ascertaining NASHstatus by histology and call for more objective means by which to define NASH. Such methods may include artificial intelligence-supported digital pathology. We highlight that NASH is most likely the causal factor for fibrosis progression in NAFLD, but the subjective definition makes the prognostic value of a histological NASH diagnosis of limited value. Third, the finding that prognostic information from biopsy and the very simple Fibrosis-4 score were comparable is important as it may lead to fewer biopsies and further move the field towards non-invasive means by which to define fibrosis and, importantly, use non-invasive tests as outcomes in clinical trials. However, all modalities had modest discriminatory capacity and new risk stratification systems are needed in NAFLD. Repeated measures of non-invasive scores may be a potential solution.


Introduction
NAFLD is a major health problem with risk of progression to cirrhosis, decompensated cirrhosis, and hepatocellular carcinoma (HCC). 1 As only a minority of patients with NAFLD develop such outcomes, estimating this risk is an important part of the patient evaluation.7][8] The traditional method for evaluating the stage of fibrosis is liver biopsy, which has disadvantages such as invasiveness, a poor intra-and interobserver correlation with sampling variability, and high costs. 9This highlights the need for alternative non-invasive methods to estimate the stage of fibrosis with similar, or better, prognostic information to that of biopsy.Another debated topic is the importance of histological NASH for prediction of incident cirrhosis.Because NASH is highly collinear with fibrosis stage, it is difficult to tease out the individual contribution of NASH to predicting disease progression. 3,10revious studies that have evaluated the difference in MALO restrictively based on fibrosis stage in patients with NAFLD have usually had limited sample sizes, few hard outcomes, short follow-up time or low granularity.In all, such limitations can lead to unprecise risk estimates that may not be generalisable.Also, the predictive capacity of liver biopsy compared with noninvasive scores such as the commonly used FIB-4 score regarding MALO is a relatively new topic. 7,11A recent multi-national study suggested that biopsy was superior to FIB-4 scores, but comparable to liver stiffness measurement (LSM) using vibrationcontrolled transient elastography (VCTE).However, the median follow up was short at around 3 years. 7Another recent metaanalysis also suggested that non-invasive tests provide similar prognostic information to histologically assessed liver fibrosis. 12ere, we aimed to investigate the long-term prognosis of a large cohort of patients with NAFLD regarding the risk of MALO across different stages of fibrosis and the presence of NASH.Further, we aimed to compare different methods for staging fibrosis as predictors for such outcomes.

Study design and study population
We conducted a cohort study (n = 1,333), pooling data from four different cohorts: Fatty Liver In Sweden part 1 (FLIS-1, n = 95), 13 Fatty Liver In Sweden part 2 (FLIS-2, n = 102), a previously published cohort study (n = 712), 3 and a contemporary cohort through a new data collection by medical chart review (n = 424).FLIS-1 was a multi-centre cross-sectional study at several Swedish university hospitals (Karolinska University Hospital, Linköping University Hospital, Sahlgrenska University Hospital, Uppsala University Hospital, and Skåne University Hospital), originally examining the role of moderate alcohol consumption in NAFLD.All patients underwent liver biopsy as part of the study. 13FLIS-2 is an ongoing cohort study, including patients with a diagnosis of NAFLD and fibrosis staging either through biopsy or VCTE, and with longitudinal follow-up over 5 years.A description of our previous long-term follow-up study of patients with biopsy-defined NAFLD is available elsewhere. 3Here, we additionally collected data from patients at Karolinska University Hospital in Stockholm, Linköping University Hospital, and Uppsala University Hospital with a diagnosis of NAFLD (n = 424), who were first identified in each hospital's electronic medical charts using a search for the International Statistical Classification of Diseases 10th Edition (ICD-10) code K76.0. 14ext, charts from such patients were reviewed by two researchers (CA, MD) to verify the diagnosis and extract data.The diagnosis of NAFLD was made through differing methods including liver biopsy (72%), and radiological measures such as controlled attenuation parameter (CAP), ultrasound, or other radiologic examinations.The fibrosis stage was estimated through biopsy, VCTE if biopsy was missing, or FIB-4 if both biopsy and VCTE were missing.

Exclusion criteria
Patients with presumed NAFLD were excluded if they had liver diseases other than NAFLD at or before baseline during chart review or by register linkage.These included alcohol-related or drug-induced liver injury, autoimmune liver disease, viral hepatitis, cholestasis, or genetic liver disease.Other exclusion criteria included an estimated daily alcohol consumption during chart review of more than 30 g for men or 20 g for women at baseline; binge drinking, defined as reporting a regular consumption of > − 5units of alcohol for men and > − 4 units for women on one and the same occasion; or previous liver decompensation.As we were interested in progression to cirrhosis, patients with baseline cirrhosis, defined as fibrosis stage 4 (FIB-4) on biopsy or VCTE > − 15 kPa, were excluded. 15,16In patients with NAFLD and the matched reference group, those with register-based diagnoses of cirrhosis, decompensation, HCC, or age <18 years were also excluded.ICD codes used to classify exclusion criteria in registers are presented in Table S1.

Baseline characteristics
Characteristics were collected from patient charts and from register linkages at baseline and at repeated timepoints, where available.This study only used data from baseline.All diagnoses (liver diseases and comorbidities) at the time of inclusion were obtained from patient charts and through ICD codes and Anatomical Therapeutic Chemical Classification System (ATC) codes obtained from registers 17 (definitions in Table S2).Hypertension was defined as a registered diagnosis, a systolic blood pressure > − 140 mmHg or diastolic blood pressure > − 90 mmHg, or the presence of antihypertensive treatment.Type 2 diabetes mellitus was defined as a registered diagnosis in the charts or from the national patient register (NPR), 18 having any antidiabetic medication prescribed, or having a fasting glucose value of > − 7.0 mmol/L.Hyperlipidaemia was defined as either a registered diagnosis in charts or the NPR, or prescribed treatment with statins or other antilipidaemic treatment, or a fasting total cholesterol value of > − 200 mg/dl.Clinical parameters such as weight and height were measured by healthcare workers within 1 month after inclusion and were used to calculate BMI.Routine biochemical variables within 1 month of liver biopsy were extracted from patient charts.Information about education was collected from registers at Statistics Sweden, 18 and education was defined as <10 years, 10-12 years, or >12 years.Lifestyle factors such as smoking were obtained from patient charts that documented whether the patient was currently a smoker, had been a smoker, or had never smoked.Key medications were obtained from the list of medications reported in the patient charts on the day of inclusion.Several other variables were also collected for every patient at baseline (Table 1).

Histopathological evaluation and modalities to stage liver fibrosis
Liver biopsies were analysed slightly differently depending on which cohort they emanated from.In biopsies from our previously reported cohort study with historical data 3 and the FLIS-1 study, 13 slides were previously reviewed by an expert pathologist (Professor Rolf Hultcrantz, deceased) and one of the authors (HH), after calibrating of the methodology with an internationally recognised expert (Professor Pierre Bedossa).These biopsies were scored according to the NASH Clinical Research Network, with a 0-3 scale for lobular inflammation and steatosis, and a 0-2 scale for ballooning. 15The presence of NASH was here defined using the fatty liver inhibition of progression (FLIP) algorithm, requiring at least one point in steatosis, lobular inflammation, and ballooning. 19,20For FLIS-2 (n = 62 with biopsy) and the contemporary cohort (n = 115 with biopsy), the local pathologists at each site reviewed the slides and the presence of NASH was defined as the 'gestalt' impression from the original pathology report, as the central reading was not available.
Fibrosis stage where biopsy was available was scored using the Kleiner or METAVIR classification systems on a 5-point scale (F0-F4), where F4 is defined as cirrhosis. 15,16To define the stage of fibrosis at baseline, only patients with biopsy or VCTE data were included.Patients were then categorised into three groups: no or mild (stage 0-1 on biopsy or <10 kPa if biopsy was missing), moderate (stage 2 on biopsy or 10-15 kPa if biopsy was missing), and advanced fibrosis (stage 3 on biopsy).

Follow up and outcomes
Follow up started at the time of liver biopsy, at VCTE examination if biopsy was not available, or when a clinical diagnosis was first documented in charts when both biopsy and VCTE were unavailable.The primary outcome was the first occurrence of an MALO, defined as an ICD-based diagnosis of cirrhosis, decompensated cirrhosis (bleeding oesophageal varices, ascites, hepatic encephalopathy, or hepatorenal syndrome), hepatocellular carcinoma (HCC), chronic liver failure, or liver-related death 21 (definitions in Table S1).All Swedish citizens have a personal identification number, which is a unique 10-digit code. 22This was used to identify patients, to access medical charts, and to create a control population and perform register linkages.
Each patient was matched on age, sex, calendar year at baseline, and municipality, with up to 10 reference individuals at baseline, identified from the Swedish Total Population Register.A total of 12,529 matched reference individuals were identified (Fig. 1).All individuals were followed until the occurrence of MALO, or were censored at non-liver death, emigration, or the end of the follow-up period (December 31, 2020).To follow the cohort, we utilised data from three sources: the NPR of Hospital Discharges, the Swedish Cancer Register (SCR), and the Cause of Death Register (CDR).Hospital discharge diagnoses obtained from the NPR have positive predictive values (PPVs) of around 85-95% depending on the diagnosis, 23 and have been specifically validated for diagnoses corresponding to cirrhosis and NAFLD, with PPVs of >90%. 24,25The SCR contains information on verified solid and non-solid tumours, and the registry is approximately 96% complete. 26The CDR provides information on the causes of death for all Swedish inhabitants, including those who have died abroad, and it is mandatory for attending physicians to report the underlying cause of death and any related diseases that could have contributed to the individual's death. 27

Statistical analysis
Participants' characteristics are expressed as medians with IQR or as total numbers with percentages where applicable.Incidence rates of MALO in patients with NAFLD and their respective matched reference population were calculated.Hazard ratios (HRs) and 95% CI were estimated with Cox regression models, using time of follow-up defined in years as the timescale. 28The analysis was conditioned on the matching variables (age, sex, and municipality), and no other adjustments were made as a result of lack of granular data in the reference individuals.Comparisons were also made in subgroups of patients with NAFLD, stratified by diagnostic modality (biopsy, VCTE, or clinical), NASH status in those with available data, and fibrosis stages, in all instances comparing patients with NAFLD with their respective matches in the reference population.Cumulative incidences over follow-up time were calculated and plotted by fibrosis severity, accounting for the competing risk of non-liver death by the Aalen-Johansen estimator. 29n the biopsy subcohort, we conducted a Cox regression to estimate the HRs of MALO associated with fibrosis stage, using fibrosis stage 0 as the reference group.Similar analyses were also performed among patients with NASH and compared with those without NASH, where possible.Multiplicative interaction was tested between fibrosis stage and NASH status.An 8-level indicator variable was created for fibrosis stage that indicated a potential interaction with NASH (e.g.patients with fibrosis stage 2 and NASH), which was then used as an independent variable in another Cox regression model with patients without NASH and fibrosis stage 0 as the reference.A value of p <0.1 indicates a significant interaction.All Cox regression models were adjusted for age and sex, and were additionally adjusted for education, type 2 diabetes, BMI, smoking status, and use of statins.Adjustment factors were decided a priori based on clinical knowledge.
Further, we estimated the HRs for MALO using Cox regression models, and computed Harrell's C-index as a measure of model discrimination at 5 and 10 years for each fibrosis staging method.
Because not all patients had complete data, we compared the C-index of FIB-4 (continuous and categorical in separate analyses) against those who had biopsy, and against those with either biopsy or VCTE. 30The 95% CI of the C-index was computed using the bootstrapping method with 500 resamples.We further repeated the analysis using age-related FIB-4 in the sensitivity analysis.
Biochemical variables that exhibited < − 30% missingness were imputed using multiple imputation by chained equations, with five completed datasets generated.Variables with >30% missingness were removed from the analysis altogether.We conducted sensitivity analysis using age-related FIB-4 to assess the impact of missing data on our findings.

Ethical considerations
The study was approved by the Regional Ethical Review Board of Stockholm, with the record number 2018/880-31.

Results
We identified 1,260 patients with NAFLD without cirrhosis and 12,529 matched reference individuals from the general population.Among the 1,260 patients with NAFLD, the median age at baseline was 52 years (IQR: 39-60) and 748 were men (59.4%).A total of 904 patients had a liver biopsy (71.8%), 118 (  0.69-1.51)and the most common comorbidity was hypertension (n = 833, 66.1%).The median BMI at baseline was 29.1 kg/m 2 (IQR 26.4-32.5)and 15.3% of the population had type 2 diabetes.Within the biopsy group, 499 patients (62.7%) had NASH at baseline.The baseline characteristics are presented in Table 1.

Risk and rate of MALO in patients with NAFLD and the reference population
A total of 111 (8.8%) MALO in the NAFLD group and 197 (1.6%) in the control group (p < − 0.001) occurred during a median follow up of 13.1 years (in total 18,657 and 201,089 person-years of follow up, respectively).The risk of MALO in NAFLD was highly associated with the stage of fibrosis at diagnosis after considering competing risks.The cumulative incidence at 20 years was 3% in patients with F0 and 35% in patients with F3 at baseline (Fig. 2).In contrast, around 2% of reference individuals developed a MALO after 20 years.Cumulative incidence of MALO at 5, 10, and 20 years of follow up are presented in Table 2, stratified by stage of fibrosis at baseline for patients with a biopsy-based diagnosis of NAFLD.The individual outcomes included in the MALO definition for patients with NAFLD and reference individuals are presented in Table S3.
The incidence rate of MALO was 5.9/1,000 person-years (95% CI 4.9-7.2) in the NAFLD population, compared with 1.0/1,000 person-years (95% CI 0.9-1.1) in the reference population.This translated to a HR of 6.6 (95% CI = 5.2-8.5).In the biopsy subcohort, patients with NASH tended to have a numerically higher rate of MALO compared with the reference population (HR 6.7; 95% CI 4.6-9.5) in comparison with those without NASH (HR 4.2; 95% CI 2.5-7.2) (Table 3).However, within the NAFLD population, there was little difference in the rate of MALO when stratified by fibrosis stage and comparing patients with and without NASH (Table 4).We found no evidence of effect modification on the risk of MALO based on presence of NASH in patients with stages 0-2.However, for patients with fibrosis stage 3, the rate of MALO was higher in patients without NASH than in those with NASH, with some evidence of statistical interaction (p = 0.018).However, this was based on only 10 patients in the subgroup of patients with fibrosis stage 3 and no NASH.
Predictivity capacity of FIB-4 against biopsy or VCTE for MALO Within the NAFLD group, higher fibrosis stages were associated with an increased rate of MALO across all three fibrosis staging modalities.The adjusted hazard ratio (aHR) for F2, as determined by biopsy, was 2.9 (95% CI 1.5-5.5),and for F3 the aHR was 8.9 (95% CI 4.6-17.4),both compared with the reference group with F0-1.In cases of moderate fibrosis defined by biopsy or VCTE (stage 2 on biopsy or 10-15 kPa on VCTE) the aHR was 2.8 (95% CI 1.5-5.2),and for advanced fibrosis (stage 3 on biopsy or > − 15 kPa on VCTE) the aHR was 7.9 (95% CI 4.2-14.8),both compared with the reference group with no or mild fibrosis (stage 0-1 on biopsy or <10 kPa on VCTE).Furthermore, we found a similar association between fibrosis estimated with FIB-4 and incident MALO, with an aHR of 4.7 (95% CI = 2.4-9.1) for those at high risk and an aHR of 2.0 (95% CI = 1.1-3.6)for intermediate risk, compared with patients defined as low risk according to FIB-4 (Table 5).
When examining the discriminative capacity of the three modalities (biopsy; biopsy or VCTE; and FIB-4) for estimating fibrosis stage and incident MALO restricted to the population where all data were available, we found that the C-index was similar for these modalities at 5 and 10 years of follow-up.Fibrosis estimated by biopsy or VCTE demonstrated the highest C-index statistics among the three modalities, followed by biopsy alone and continuous FIB-4 at both 5 and 10 years.However, the categorical FIB-4 group had a similar C-index statistics at 5 years (0.701) compared with the biopsy group (0.701).This trend shifted after 10 years, when the C-index statistics for the categorical FIB-4 group were somewhat lower at 0.719 compared with the biopsy C-index statistics at 0.734 (Table 6).Furthermore, we used age-related cut-off of FIB-4 and found that the Cindex was not superior to the FIB-4 without age adjustment (Table S5).

Discussion
Several observations can be made from this large cohort study.First, we confirm that fibrosis stage in NAFLD is predictive of progression to cirrhosis or complications thereof.Because of the large sample size, our estimates may be more accurate than those from previous studies on the topic. 3,31In fact, we found   Second, we again confirm a high correlation between the presence of NASH and higher stages of fibrosis.However, within the same stage of fibrosis, we did not find a meaningful difference in risk between patients with and without histological NASH.This may be because of the known issues of uncertainty surrounding the identification of NASH, 32 but could also be because of the low number of patients in the subgroup.Finally, we show that the predictive capacity of different modalities for estimating the stage of fibrosis are comparable when trying to estimate future risk of cirrhosis in NAFLD.This information is important, as it would allow for a transition to the use of non-invasive methods of fibrosis staging when determining risk of future cirrhosis in NAFLD.

Comparison with previous studies
These results are in alignment with previous studies from us 3 and others. 2,4We also confirm previous findings [2][3][4]31,33 that presence of NASH, as currently defined by pathologists, does not add much to the prognostic information about the risk of cirrhosis on top of knowledge of the fibrosis stage. This ay be because of the subjectivity involved in defining hepatocellular ballooning in particular.32 In contrast to a recent study with a similar methodology, 7 we found somewhat lower predictive capacity of both biopsy and FIB-4 in terms of lower C-index statistics, and found that estimates from these modalities did not differ significantly.The C-index statistics for FIB-4 and biopsy were 0.71 and 0.70 at 5 years, compared to C-index statistics of 0.93 for biopsy and 0.78 for FIB-4 in the study by Boursier et al. 7 This may likely be explained by a considerably longer follow-up in this study, a larger sample size, and more events.Hence, our estimates may be more accurate and generalisable for long-term prediction of MALO.

Strengths and limitations
The main strength of this study is the large size of the cohortone of the largest cohorts with biopsy-proven NAFLD patients with granular dataand the long follow-up time, allowing enough MALO to be captured to give a meaningful statistical analysis.The linkage to national registers 23 minimises loss to follow up and allows for accurate ascertainment of validated MALO. 25 We could also compare risk estimates to those of matched reference individuals, allowing for a higher level of contextuality.We utilised more advanced statistical techniques than those usually performed in the field, such as multiple imputation to better account for missing data. 34imitations include the fact that NASH status was defined differently depending on which cohort patients were identified from.However, most patients had a biopsy reviewed by an expert pathologist.Because we combined data from four cohorts that were conducted at multiple centres and initiated over different periods, we cannot eliminate the possibility of cohort bias in our study.Liver biopsy was more commonly used in earlier parts of the study period for diagnosing and staging NAFLD, whereas VCTE has only become available in recent years.As a result, it is possible that patient populations may differ between these different modalities, potentially leading to differences in the distribution of patient characteristics over the study period. 35Secondly, there is always a possibility of misclassification bias in register-based studies.However, the MALO used in this study have been previously validated and found to have high PPVs. 25Selection bias is always likely in studies with biopsy-diagnosed NAFLD, partly because most patients come from secondary or tertiary care hospitals.Biopsy is not performed in most patients with NAFLD.However, estimates should be comparable to those for other patients in secondary or tertiary levels of care, 7 and the prevalence of advanced fibrosis in this cohort was smaller than in other contemporary cohorts, suggesting a lower risk of selection bias.Thirdly, we did not consider time-dependent variables such as incident T2D and alcohol use in the models.However, such information would not be available at the time of an examination and therefore do not impact the predicted risk from baseline, that is, the diagnosis date of NAFLD.Finally, as the study population consists of patients mainly born in Sweden, the findings may not be generalisable to settings outside Sweden with different risk profiles and ethnicities.As only 5% of the study participants had a BMI <25 kg/m 2 , the findings might not apply to this group because of potentially different underlying pathology in patients with lean NAFLD.

Conclusions
In this large cohort study, including 1,260 patients with non-cirrhotic NAFLD, we found that more than one-third of patients with fibrosis stage 3 develop cirrhosis within 20 years.Further, we confirm the role of fibrosis staging in determining prognosis and again show that histological NASH provides little prognostic information.Finally, we found that the prognostic information from histologically defined fibrosis was comparable to the FIB-4 score.However, both modalities had moderate discriminations, and new prediction models for NAFLD are needed.
Table S3.Major adverse liver outcomes in patients with NAFLD and the reference population at full follow-up (median 15 years).

Fig. 2 .
Fig. 2. Cumulative incidence of MALO stratified by fibrosis stage and compared with reference individuals from the general population.MALO, major adverse liver outcomes.
(continued on next page)

patients diagnosed with NAFLD in the University Hospitals in Sweden from December 18 th 1974 to December 31 th 2020
9.4%) had VCTE but no biopsy, and 238 (18.9%) had neither VCTE nor a biopsy.The median FIB-4 value at baseline was 0.97 (IQR

Table 2 .
Cumulative incidence of major adverse liver outcomes stratified by fibrosis stage and compared with reference individuals from the general population.

Table 3 .
Associations with MALO in patients with NAFLD and the reference population, for the full population and across subgroups.

Table 4 .
Associations between fibrosis stage and NASH, and MALO, restricted to patients with NAFLD and biopsy data with NASH status available.HR adjusted for age, sex, education, diabetes, BMI, smoking, and statins. †

Table 5 .
Association between fibrosis estimated by liver biopsy; liver biopsy or VCTE; and FIB-4 in patients with NAFLD and incident major adverse liver outcomes.more than one-third of patients with fibrosis stage 3 developed cirrhosis within 20 years of follow up.Few previous studies have had this unprecedented duration of follow up.