The bifurcation of artificial intelligence is mainly into its two divisions, namely machine learning and deep learning (Fig. 1). A total of 475 articles were extracted from the database. The inclusion criteria included original articles, case studies, review articles, and systematic reviews. Pre-clinical/non-clinical articles, case reports, letters to editors, and pediatric studies were excluded. The number of articles included for analysis was 95 (Fig. 2).
AI in predicting risk factors of NAFLD
In a study by Garcia-Carretero et al., the prevalence of NASH in 2239 hypertensive patients and assessed the relevant features related to hypertension and metabolic syndrome (MS) using supervised machine learning algorithms such as least absolute shrinkage and selection operator (LASSO) and random forest classifier was assessed [16]. LASSO is a regression analysis algorithm that uses an L1 regularization technique, that is, it adds a penalty term to the regression function. A random forest algorithm was used to assess feature importance in regression model produced by LASSO. In univariate analyses, it was associated with metabolic syndrome, type 2 diabetes, insulin resistance, and dyslipidemia. Serum ferritin and insulin were selected with high sensitivity and specificity using the LASSO approach with a sensitivity of 70%, specificity of 79%, and area under the curve of 0.79 [16]. Another study by Garcia-Carretero et al. used random forest (RF) models for predicting patients at risk of developing NASH in 1525 patients [15]. The electronic health records were used to assess the presence of NASH. The random forest model correctly classified patients with NASH with an accuracy of 0.87 in the best model and to 0.79 in the worst one. Four features that were the most relevant included insulin resistance, ferritin, serum levels of insulin, and triglycerides. Random forest-based modeling demonstrated that machine learning could be used to improve interpretability, produce an understanding of the modeled behavior, and demonstrate how far certain features can contribute to predictions [15].
Diagnosis of NAFLD using AI
In literature, various diagnostic models for NAFLD were studied. Some of the algorithms were logistic regression (LR), k-nearest neighbor (kNN), support vector machine (SVM), naive Bayes, Bayesian network (BN), and decision tree and K2 algorithm including adaptive boosting (AdaBoost), bootstrap aggregating (bagging), and random forest and extensions to the algorithm like hidden naive Bayes (HNB) and aggregating one-dependence estimators (AODE) [23]. Ma H et al. investigated these 11 machine learning algorithms in 10508 patients to predict the best diagnostic model of NAFLD [23]. They reported that 83.41% accuracy was detected with the logistic regression (LR) model, whereas the highest specificity and precision was achieved by the SVM model with values of 0.946 and 0.725, respectively. AODE model was the most sensitive, with a value of 0.680. In this study, F-measure was used to analyze the classification for building these prediction models, with the highest F-measure being 0.655 for BN model and the lowest was for fatty liver index (FLI) with a value of 0.318. The authors determined that the best performance was shown by the BN model with a 9.17% improvement in the F-measure score [23].
Yip et al. included 922 patients to compare logistic regression, AdaBoost, and ridge regression. Finally, the logistic regression model achieved an accuracy of 87–88% and six relevant features, such as insulin resistance, triglycerides, or alanine aminotransferase [33]. Sorino et al. compared eight different machine learning algorithms, namely Boosting Tree Classifier (using Adaboost Classifier), Decision Tree Classifier, Naive Bayes Classifier, K-Nearest Neighbors Classifier, Neural Network Classifier, Random Forest Classifier, Regularized Multinomial Classifier (use Logistic regression), and Support Vector Machine Classifier. Using the Meta learner approach, three models consisting (1) FLI plus GLUCOSE plus SEX plus AGE, (2) abdominal volume index (AVI) plus GLUCOSE plus gamma-glutamyl transpeptidase (GGT) plus SEX plus AGE, and (3) body roundness index (BRI) plus GLUCOSE plus GGT plus SEX plus AGE were created. The authors reported SVM algorithm (Support Vector Machine in Python) was the most appropriate and had better performance in the analyzed models [28]. Model 3 had the highest accuracy of 77% compared to models 2 and 1, with an accuracy of 68% each. As model 2 had lesser prediction errors, it was considered the best model [28]. Cheng et al. developed several models using KNN, RF, and support vector machines (SVM) to detect NAFLD. They observed that SVM had 86.9% accuracy in men, and RF had 80% in women. Both models selected some relevant features, including cholesterol-related and insulin resistance-related factors [8].
Docherty et al. developed a machine learning (ML) model to predict NASH, using confirmed NASH and non-NASH based on liver histology results in the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) dataset to train the model [11]. An extreme gradient boosting model (XGBoost) consisting of 14 features exhibited high performance as measured by area under the curve (0.82), sensitivity (81%), and precision (81%) in predicting NASH [11]. Slightly reduced performance was observed with an abbreviated feature set of 5 variables (0.79, 80%, and 80%, respectively) [11]. The full model demonstrated good performance (AUC 0.76) to predict NASH in Optum data [11]. The proposed model, named NASH map, is the first ML model developed with confirmed NASH and non-NASH cases as determined through liver biopsy and validated on a large, real-world patient dataset [11].
AI in predicting the severity and staging of NASH
For the assessment of severity of nonalcoholic fatty liver disease (NAFLD) and identification of patients with nonalcoholic steatohepatitis (NASH), a novel machine learning approach, ensemble feature selection (EFS), was devised by Canbay et al. [5]. Non-invasive parameters were selected by an ensemble feature selection (EFS) from a retrospectively collected training cohort of 164 obese individuals (age: 43.5 ± 10.3 years; BMI: 54.1 ± 10.1kg/m2) to develop a model able to predict the histological assessed NAFLD activity score (NAS) [5]. Advantages of this score are a continuous distribution allowing disease assessment apart from a dichotomous classification as NAFL or NASH and thus could possibly be used to monitor disease progression or resolution over time. Additional parameters, i.e., transient elastography or controlled attenuation parameter, could be added, given sufficiently large reference datasets [5]. Okanoue et al. developed novel non-invasive test with the help of an AI/neural network system called NASH-Scope, and the model could accurately distinguish between NAFLD and non-NAFLD and between NAFLD without fibrosis and NASH with fibrosis in 398 histologically diagnosed NAFLD patients [25]. Moreover, a systematic review by Li et al. evaluating AI-assisted diagnosis of liver fibrosis and NAFLD demonstrated promising potential and validation of these models in larger cohorts is required before implementing it into clinical practice [21]. AI (artificial intelligence) application in predicting NAFLD is extensively reviewed elsewhere [32]. The NAFLD ridge score is a machine-learning algorithm and is one of the most effective tools to detect NAFLD [33]. It is based on multiple laboratory parameters that include serum levels of ALT, serum triglycerides, HDL, HbA1c, hypertension, and leukocyte count and has an AUROC value of 0.87 [33] . It uses H-MRS (proton magnetic resonance spectroscopy) as a reference and has a negative predictive value (NPV) of 96% [33]. Despite being an effective scoring system to detect NAFLD, its use is limited to the research setting and fails to risk stratify steatosis progression [3].
AI in imaging modalities
Pasdar et al., in a multicenter prospective cohort study of 3029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing NAFLD (n = 2234). The analyses applied machine learning methods to data from the deep-phenotyped IMI DIRECT cohorts (n = 1514) to identify sets of highly informative variables to predict NAFLD. The criterion measure was liver fat quantified from MRI. LASSO (least absolute shrinkage and selection operator) was applied to select features from the different layers of omics data and random forest analysis to develop the models. A total of 18 prediction models were developed. The authors reported that the model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables [4].
In a study by Cao et al., two-dimensional hepatic imaging was analyzed by the envelope signal, grey scale signal, and deep-learning index obtained by 3 image-processing techniques in 240 participants with mild, moderate, and severe NAFLD [6]. The authors reported that the 3 methods showed good ability (AUC > 0.7) to identify NAFLD. Meanwhile, the deep-learning index showed superior diagnostic ability in distinguishing moderate and severe NAFLD (AUC = 0.958) [6].
Rapid MRI techniques could be used to predict nonalcoholic steatohepatitis (NASH) noninvasively by measuring liver stiffness with magnetic resonance elastography (MRE) and liver fat with chemical shift-encoded (CSE) MRI [12]. So, Dzyubak et al. validated an automated image analysis technique to maximize the utility of these methods in eighty-three patients with suspected NAFLD [12]. A logistic regression model to predict pathology-diagnosed NASH was trained based on stiffness and proton density fat fraction. The area under the receiver operating characteristic curve (AUROC) was calculated using 10-fold cross-validation for models based on both automated and manual measurements [12]. Liver stiffness and PDFF were also calculated using an automated method. A separate model was trained to predict the NASH severity score (NAS). The model for predicting biopsy-diagnosed NASH had an AUROC of 0.87, and the NAS-prediction model had a C-statistic of 0.85. The stiffness and PDFF measurements based on automated ROIs had a higher agreement with the expert reader (R2 = 0.87 for stiffness and R2 = 0.99 for PDFF) than the expert and experienced readers had with each other (R2 = 0.85 for stiffness and R2 = 0.98 for PDFF) [12].
In a study by Addeman et al., a novel software package named AdipoQuant for the automated quantification of total adipose tissue (TAT), SAT, and IAAT in the abdomen was used, and similar results were obtained to manual segmentation methods [1].
Electronic health records and NAFLD
Logistic regression, decision trees, RF, extreme gradient boosting (XGBoost), or k-nearest neighbors (KNN) have been used with electronic health records (EHR), while neural networks and deep learning have been used for histology and images [30]. Sowa et al. included EHR of 126 patients to develop a final model with an accuracy of 0.79. However, this model relied on features that are not easily collected or measured, such as apoptosis markers [29]. Genome-wide association studies (GWAS) have identified several risk loci for nonalcoholic fatty liver disease (NAFLD). GWAS of 4761 cases of NAFLD and 373,227 healthy controls without evidence of NAFLD was performed using electronic health records by Fairfield et al. [14]. Loomis et al. conducted large scale electronic health record database studies with The Health Improvement Network (THIN) database (n = 133,525) and Humedica EHR database (n = 148,934) and established the consistent and strong relationships between body mass index (BMI) and prospectively recorded diagnoses of NAFLD/NASH and emphasize the importance of weight reduction strategies for prevention and management of NAFLD [22].
Danford et al. developed and validated an electronic health record (EHR) algorithm to accurately identify cases of NASH cirrhosis in the HER (n = 300) [10]. Recommendations of the Electronic Medical Records and Genomics (eMERGE) network, a network funded by the National Human Genome Research Institute, was followed to construct the algorithm [10]. The algorithm with the highest PPV of 100% on internal validation and 92% on external validation consisted of ≥ 3 counts of cirrhosis, no mention of alcohol (571.5, K74.6), and ≥ 3 counts of nonalcoholic fatty liver (571.8–571.9, K75.81, K76.0) codes in the absence of any diagnosis codes for other common causes of chronic liver disease [10].
Nonalcoholic fatty liver disease in association with advanced fibrosis and cancer
The current machine learning approaches have identified type 2 diabetes mellitus (T2DM) as a strongly correlated feature with some degree of liver fibrosis and adverse hepatic outcomes (cirrhosis, malignancy) [30]. In a study by Aggarwal and Alkhouri, the authors reported that machine learning algorithms such as deep learning radiomic elastography (DLRE) have excellent accuracy in diagnosing cases of advanced fibrosis [2]. This finding was based on another study by Wang et al. where 344 patients with nonalcoholic fatty liver disease (NAFLD) underwent 428 liver biopsies (240 had paired transient elastography examination) [31]. The fibrosis stage was scored using the NASH Clinical Research Network system, and automated quantification of fibrosis-related parameters (q-FPs) was measured by dual photon microscopy using unstained slides. At the best cut-offs, the two q-FPs had 88.3–96.2% sensitivity and 78.1–91.1% specificity for different fibrosis stages in the validation cohort [31]. It was noted that automated quantification of fibrosis-related parameters by dual-photon microscopy has high accuracy in diagnosing fibrosis and cirrhosis in NAFLD patients [31].
Lewinska et al. developed a noninvasive surveillance method for NAFLD-hepatocellular carcinoma (HCC) [20]. Using comprehensive ultra-high-performance liquid chromatography mass-spectrometry, they investigated 1295 metabolites in serum from 249 patients. The area under the receiver operating characteristic curve was calculated for all detected metabolites and used to establish their diagnostic potential, and logistic regression analysis was used to establish the diagnostic score [20]. The diagnostic model was constructed using ROC curves generated by Monte-Carlo cross-validation (MCCV) using balanced sub-sampling, and the linear support vector machine (SVM) method was used for sample classification [20]. The authors reported that the combination of 5 metabolites accurately distinguishes NAFLD-HCC patients from healthy individuals (AUC = 0.989), morbidly obese bariatric surgery NAFLD (OB-NAFLD) patients (AUC = 0.997), and patients with alcohol- and viral-associated HCC (AV-HCC) (AUC = 0.999), and this model performed well against a validation set of NAFLD patients (AUC = 0.905) [20]. With the help of machine learning model, the authors speculated that NAFLD-HCC tumors act as sinks for unsaturated fatty acids from the blood and link between increased transport of fatty acids by CD36 and NAFLD-HCC [20, 27].