Application of Biochemical Tests and Machine Learning Techniques to Diagnose and Evaluate Liver Disease

Background: The liver function tests (LFTs) remain one of the most commonly employed clinical measures for the diagnosis of hepatobiliary disease. LFTs sometimes referred to as hepatic panel help to determine the health of liver, monitor the progression of a disease and measure the severity of a disease particularly scarring or cirrhosis of the liver. Aims: In this study, we present a new approach to evaluate the natural progression of liver disease through the assessment of eight biochemical parameters: serum total bilirubin (TB), alanine aminotransferase (ALT), aspartate aminotransferase (AST), Alkaline phosphatase (ALP), total protein (TP), albumin (ALB), albu-min/globulin (A/G) ratio, and alpha-fetoprotein (AFP) as well as two machine learning (ML) tools—Random Forest and CART to substantive the outcome. Methods: The study was carried out in a total of 100 subjects which included healthy controls (group I-25 patients), patients with acute hepatitis (group II-25 patients), chronic hepatitis (group III-25 patients) and hepatocellular carcinoma (group IV-25 patients) applying both biochemical and Machine Learning methods. Results: Of the eight parameters tested, all except ALP (p = 0.426), The accuracy of classification into different liver patient groups using random Forest and CART was 94% and 95% respectively. Conclusion: Acute hepatitis (group II) shows a higher level of AST, ALT and ALP compared to chronic hepatitis (group III) and hepatocellular carcinoma (group IV). Two machine learning algorithms also predicted and supported the same biochemical results by correctly classifying liver disease patients. We also recommend that the AFP test can be performed if hepatocellular carcinoma is suspected.


Introduction
As a body's largest organ, the liver plays an important role in our whole body's blood transfer and controls the concentrations of most of the chemical and waste products in our blood. Thus, it is important to keep the liver healthy. Parasites, viruses cause inflammation and decrease function by infecting the liver, subsequently causing Liver Disease (LD) [1].
LD is a common clinical disorder; it is also associated with high morbidity and mortality [2]. Additionally, LD has been increasing in parallel with the prevalence of diabetes, metabolic syndrome, alcohol and obesity [3]. Higher prevalence of LD has appeared as a greater economic burden. Therefore, accurate identification of individuals at risk and early recognition of LD could offer immense benefits for diagnosis, prevention, or even proper treatment. Subsequently, reliance on a single diagnostics test is not sufficient to evaluate liver function [4]. A wide variety of biochemical measures are therefore used to determine the general condition of the liver. Different biochemical tests commonly referred to as Liver Function Tests (LFT) provide secondary evidence for hepatic diseases [5]. Metrical record, physical examination along with diagnostic test (LFTs) results entail to 1) recognize patients with liver disease; 2) diagnosis of differential jaundice; 3) monitor the severity (i.e., course and response of the disease); and 4) detect hepatotoxicity caused by various agents [6]. In addition, commonly used LFTs are mainly used to determine liver damage instead of monitoring hepatic functions which can make the identification of disease complicated [7]. Certainly, these biochemical tests can also detect problems such as hemolysis (high bilirubin), higher alkaline phosphatase level (bone disease). Abnormal LFTs often suggest that the liver may not function properly and indicate the severity of the problem. But still, the correctness and accuracy to predict liver disease remain uncertain [6].
In this essence, computer-based diagnosis methods/tools such as Machine Learning (ML) can help to predict liver diseases correctly with precision. The knowledge discovery of ML has made it possible to handle valuable data to en-hance decision making both in medical diagnosis and prognosis. Researchers show potential interest in ML to support data mining, classification techniques (based on features or characteristics) towards medical diagnosis and prediction of liver diseases [8]. In medical settings, groups of patients can be diagnosed into different classes with respect to types and/or subtypes of diseases. In ML, classification is defined as a supervised method where training data is fitted into the model, then the model is trained with the dataset(s) with a known class of sample based on the features [9] [10]. Later the model predicts the test sample class which is unknown [9]. The classifier performance is evaluated by measuring the accuracy of classification that describes the percentage of correctly classified occurrences. ML provides the promise to improve the diagnosis and predict diseases that are of concern in liver diseases [8].
Several liver functions tests are carried out for the estimation and medication of hepatic dysfunction in patients. The biochemical markers: serum bilirubin, alanine aminotransferase (ALT), aspartate aminotransferase (AST), alkaline phosphatase (ALP), α-fetoprotein, 5' nucleosidase, ceruloplasmin are liver function tests [5] [11] [12]. Few other studies have indicated that ALT, AST, ALP, GGT, Bilirubin, prothrombin time, serum Albumin are the tests that are commonly performed in liver disease patients [13] [14]. In these studies, researchers highlighted the importance of these tests to reflect liver functionality. For instance, bilirubin presents the excretions of anions, transaminases explain the hepatocellular integrity, bilirubin, ALP describes the creation of bile and subsequent flow of bile freely, and albumin denotes the protein synthesis. Overall, researchers comprehensively presented all these biomarkers along with AFP, serum proteins to screen the liver functions [4]. Also, they found that AFP was increased in the case of hepatocellular carcinoma. In addition, in the case of asymptomatic patients, a mild increase of serum ALT levels are observed and around one-third of the patients show normal liver enzyme function persistently [6] [15].
Recently, authors proposed to build intelligent medical decision support systems by using ML through classification of liver disease and clustering to patterns that can benefit physicians for the treatment [16]. Other studies have recommended the data categorization based on liver ailment and used different algorithms (J-48, SVM, Random Forest tree, etc.) for classifying these liver disease conditions [17] [18]. For example, researchers use six ML methods; LR, KNN, DT, SVM, NB, RF to classify the patients with liver disease. They estimated accuracy, recall (specificity) and precision (sensitivity) of the applied methods while classifying the patients into groups [19]. In another study, CART (Classification and Regression Tree) was represented to detect liver disease patients and obtained 92.94% accuracy [20]. Authors applied classification algorithms; naïve bayes, C4.5 decision tree, back propagation, SVM and KNN. They compared the performance of classifiers to classify the patients on the basis of accuracy, precision, sensitivity and specificity [21]. Very recently in one study, authors pre-dicted the risk factor of chronic kidney disease using different machine learning algorithms; Random forest, Decision Stump, Linear regression, Naïve Bayes, Simple logistic regression while classifying the CKD patients [22].
In this study, considering the importance and utilities of both of these methods, we strived to find two key research questions: 1) which biochemical parameter is significantly associated with different liver disease(s), and 2) assess/validate the outcome by using ML tools. To get the answers, we performed a cross-sectional study to a) evaluate liver patients using different biochemical markers (TB, ALT, AST, ALP, TP, ALB, A/G ratio, and AFP, and ALT, b) employ two prediction models (Random Forest and CART) of ML to support the findings. We aim to establish the relationships between conventional LFTs of liver with ML tools to verify the accuracy. We believe that the study helps clinicians to correctly identify and make an actionable decision of prevention, early diagnosis, and targeted intervention. The study provides a fresh insight about utilizing both the biochemical test and ML tools to predict liver disease. To our knowledge, the study is the first initiative to connect traditional biochemical tests with computational analysis and subsequently establishes the reliance of ML to support the outcome from biochemical tests.

Blood Samples
A sample of venous blood was collected and allowed to clot. It was then centrifuged at 4˚C for 10 minutes at 4000 rpm. The serum was then separated and stored at −20˚C until analyzed. The process flow has been shown in Figure 2.

Laboratory Tests
The following routine liver function tests were executed by standard methods.
The reference number-Serum bilirubin [23], serum alanine aminotransferase [24] and aspartate aminotransferase [24], serum alkaline phosphatase [25], serum total protein [26], serum albumin [27]-are followed to prepare samples, and to measure the quantity of each parameter as LFTs in those patients. The value of albumin was subtracted from the total protein and serum globulin concentration was calculated as the total protein value consists of both albumin and globulin values.

Bilirubin
Total and direct Bilirubin was measured by the modified Jendrassik-Grof Analysis method with centrifugal analyzer method [23].

AST and ALT
To measure both of these two enzymes correctly, conditions were optimized with basic variables (buffer kind, conc of buffer, ion, pH) and kinetic parameters (inhibitor and Michaelis constant were determined) [24]. ALP Serum ALP was measured using a test kit named "Iso-ALP", Boehringer Mannheim. The principle of the assay is based on Rosalki and Foo (Clin Chem 1984; 30: 1182-6) [25].
Total protein TP was determined based on the Biuret method [26].

Albumin
Serum albumin was measured using a prompt and consistent method with Bromocresol green. In this method, when albumin was added to 0.075M succinate buffer in pH4.20 with the Bromocresol solution, absorbance increased at 628 nm [27].

Statistical Analysis
All statistical analyses were done using the SPSS statistical package (Ver. 10.0). One-way analysis of variance (ANOVA) following a logarithmic transformation of the data was initially used to detect the overall difference in group means of the eight biochemical parameters. Differences among group means were assessed using the least significance (LSD) [29].

Random Forest
RF is an aggregate classifier which works efficiently on a large dataset. It is a regression classification method that performs through generating decision trees from randomly selected subset of training data and gives output class (i.e., which is the output from individual trees [30]. There is no need to reduce variables during the analysis because RF can maintain thousands of input attributes easily. RF provides the estimation of variables which are important in the classification. During the generation of multiple trees, input vectors are put down to each of the decision trees in the forest to classify a new liver object. Each tree gets a vote for new data classification for accuracy. Most votes containing classes are chosen by the forest. The Following equations are employed to classify the liver data with RF model [31]. ni j = the importance of node j • w j = weighted number of samples reaching node j • C j = the impurity value of node j • left(j) = child node from left split on node j • right(j) = child node from right split on node j :node splits on feature all nodes • fi i = the importance of feature i • ni j = the importance of node j all features

CART
CART (Classification and regression trees) is a ML classification model which helps to obtain a variable depending on other variables which are labeled and then predict the class through asking a set of if-else questions [32]. There are two advantages to using these models; 1) Nonlinear dataset is handled by this model, 2) data normalization or Standardization is not needed as distance or other quantitative parameters between data points are not calculated. In the construction of the tree model, there are three types of nodes (Root, Internal, and Leaf) involved in the tree. Each node has its own if-else question for variables that can direct to a specific leaf-node for the final prediction of class using decision boundaries [33]. Information Gain (IG) is a criterion to detect the purity of a node and can be measured depending on the split of items by a node. Corresponding impurity criterion is used to split the features (Table 1).   • T = target variable • X = Feature to be split on • Entropy (T, X) = The entropy calculated after the data is split on feature X

Patient Characteristics
The summary data of the number and age of the various groups of patients and controls used in this study is represented in Table 2. The mean age did not differ significantly among groups. Inspection of the data showed clear evidence of non-Gaussian distributions. However, a transformation to log10 was successful in normalizing the data. All subsequent analyses were based on the logarithm of the raw values.

Comparison of Biomedical Parameters
There were large and statistically highly significant differences in means among the four groups of all eight biochemical parameters, as expected. These are summarized in Table 3, which also shows how the groups differed for each parameter. For example, the mean of the controls differed from that of all other groups for bilirubin, ALT and AST, but did not differ from AH and CH for AFP. Table 3  Although group means differed, there were considerable overlaps among individual patients for each biochemical parameter. The result is illustrated in Figure 3(a)-(h). The figures show that many of the liver disease patients were clearly in pathological range, while others were within normal range. Therefore, it would not be possible to draw a line on any of these plots which would separate all the liver disease patients from controls, or which could distinguish between the liver disease groups.    Figure 4, it has been observed that in the AH group (Figure 4(a)), when the patient's bilirubin was low, then the concentrations of AST, ALT, and ALP were low and patients with high bilirubin values showed a high concentration of AST, ALT, and ALP enzymes that describe the severity of the disease in acute conditions. Other parameters such as total Protein, Albumin, and AFP of all the patients observed constant regarding elevated bilirubin level. As apparent from Figure 4(b), in the case of the low and high value of bilirubin for CH, the corresponding liver enzymes i.e., AST, ALT, and ALP showed higher values, respectively. The other parameters showed consistency with bilirubin value that elaborates insignificant change of the value of AFP, TP, or albumin with the higher value of bilirubin. We observed a similar (CH) outcome while comparing HC Bilirubin with other liver enzymes (Figure 4(c)), In the case of HC, Liver enzyme levels did not increase substantially and patients with high-value bilirubin did not show high value of ALT, AST. AFP concentration showed substantially higher than the other two enzymes. However, the values of other parameters showed a similar outcome with AH and CH. All these findings are consistent with the results shown in Figure 3.

Random Forest
RF Classifier model have classified 100 patients into 4 groups (Control, AH, CH, HC). Out of 100 patients, the model has classified 95 patients correctly with whole data set (Table 4). When data was split into train (70%) and test (30%) samples, then model have classified 30 patients as test samples and 28 patients were classified correctly ( Table 5). The model also has estimated the features and ranked them 1 to 7 based on feature importance (Table 6). From this estimation, it has been identified that Bilirubin is the most important parameter in diagnosing all 4 liver disease patients. The 2 nd important parameter is the AFP;

CART
The CART classifier model correctly classified 94 patients correctly into 4 different groups out of 100 patients. The whole data is trained and predicts the class. The results are shown in Table 7 as a confusion matrix. One of the impor-tant findings of the classification tree is that out of 7.4 features; AFP, Bilirubin, ALT, AST were important in the classification process. These 4 (four) features are required to perform the classification. The results show that only AFP is required to classify the 25 HC patients correctly, Bilirubin is required for the Control group, and all patients were classified correctly. Both ALT and AST were required for AH and CH patients classification. All 25 AH and 21 CH patients were classified correctly, 6 patients were wrongly classified for HC patients ( Figure 6).

Biochemical Tests
In this study, the diagnostic effectiveness of eight biochemical parameters were evaluated among four subject groups including controls and patients with three types of liver disease. The biochemical parameters were chosen to measure a range of known biochemical functions of the organ. Since the liver performs a wide variety of tasks, hence relying on a single test is not sufficient to evaluate liver function. Therefore, a wide variety of diagnostic tests is imperative for the indication of hepatobiliary disease along with ML classification scheme with two models i.e. RF & CART.

Bilirubin
Although a small amount of unconjugated (indirect) bilirubin is present in healthy people, there is virtually no observable conjugation (direct) bilirubin in the blood. This is due to the rapid secretion of conjugated bilirubin into the bile. Serum bilirubin levels will not increase no less than half of the liver's excretion potential is lost. In this study, we found that serum total bilirubin was increased significantly in the patients with acute hepatitis, and it showed no overlapping with the control group (Figure 3(a)). In groups III (chronic hepatitis) and IV (hepatocellular carcinoma) bilirubin levels, though remained elevated compared to the controls, were not as high as in acute hepatitis. Serum bilirubin help to determine abnormalities in hepatic uptake, conjugation and secretion [34].

ALT and AST
The levels of ALT and AST describe the most used markers of liver impairment. This study demonstrated a significant elevation of serum ALT and AST in the patients of acute hepatitis compared to controls (Figure 3(b), Figure 3(c)). In group IV, i.e., in hepatocellular carcinoma patients, ALT levels did not differ significantly from the controls. In hepatocellular carcinoma, apoptosis induced dead hepatocytes wither away and likely synthesize less of the enzymes. This is likely to clarify why most patients with hepatocellular carcinoma have persistently normal liver enzymes though having inflammation in the liver biopsy [35] [36]. The mean AST value in group IV, though was significantly higher than it was in the control group, nevertheless had considerable overlap with the control groups. The extrahepatic origin of this enzyme might explain this difference from ALT. Moreover, pyridoxine deficiency might be another reason behind this phenomenon. Though ALT formation is inhibited more strongly than that of AST by pyridoxine deficiency, both ALT and AST use pyridoxine as coenzymes [37]. In group III, i.e., in chronic hepatitis patients, both ALT and AST values significantly higher than control groups but not as high as it was in case of group were acute hepatitis patients. The reason behind this is not well understood. It is observed from the recent studies that histological evidence can be found for chronic Hepatitis C Virus (HCV infection in patients, while normal or slightly elevated serum transaminases are frequently present [38].

ALP
Serum ALP values did not show any substantial divergence between controls and the three other patient groups (Figure 3(d)). The results obtained also suggest that ALP is not a good marker to identify hepatocellular injury or intrahepatic problems strikingly high ALP levels indicate the risk of extrahepatic biliary obstruction, primary liver cirrhosis and cholestasis triggered by drugs. Since this study did not recruit any such patient, the serum ALP value in all the four groups studied, were almost comparable to each other.

Serum Albumin
Serum total protein and A/G ratio are also indirect measures for the synthetic capacity of the liver, as most plasma proteins are synthesized in the liver. We also found in this study that total protein, albumin, and A/G ratio are decreased significantly in groups III (CH) and IV (HC) but not in group II (AH) ( Figure  3(e), Figure 3(f), Figure 3(g)). The likely explanations are as follows: 1) Approximately, three weeks is the half-life of albumin, and it is a large life cycle, 2) the reduced synthetic ability of liver is compensated by the double production of albumin compared to normal synthesis rate. So, serum albumin concentration changes slowly in response to alterations in protein synthesis. This is possibly why serum albumin, total protein and A/G ratio in acute hepatitis are within the normal range. Overall, these three biochemical tests are positive indicators of chronic liver disease, but chronic renal failure, urinary protein loss or loss of gastrointestinal properties may affect levels [39] [40].

Alpha-Fetoprotein
In this study, AFP was assessed to discriminate hepatocellular carcinoma from other liver diseases. This study clearly demonstrates that AFP is elevated significantly only in group IV (HC) compared to controls (Figure 3(h)). Benign hepatic disease; for instance acute chronic active hepatitis, viral hepatitis, and liver cirrhosis occasionally show elevated AFP levels [41]. However, this study did not find any such association. Another important finding of this study is that groups III and IV are not distinguishable by any of the parameters used in this study except AFP.

Random Forest
ML study supports the finding from the biochemical analysis. RF model estimates feature importance when predicting the class or groups of patients ( Figure 5). This finding resulted Bilirubin as most important features that denotes the marker in diagnosing all 4 liver diseases.AFP turns out to be second important features that correlates with finding and discussion for biochemical findings as AFP can differentiate HC groups from other groups.ALT and AST are predicted another crucial feature in ML study correlating with biochemical results as these two parameters values highly increased in AH and CH patients. Feature ranking ( Figure 5) shows that the first four variables (Bilirubin, AFP, ALT, AST) turn out to be required to diagnose liver patients, other features have insignificant importance for classification that are also discussed in the biochemical part and evident from the result.

CART
From this ML classification using the CART model, it has been shown that HC patients are classified from the data of AFP that correlates with the experimental results (Section 3.2). AH and CH patients are classified from the data of AST and ALT, meaning that these enzymes are important markers for the diagnosis of AH, CH. This finding also correlates with the experimental results. From the tree, the 4 features (i.e. Bilirubin, ALT, AST, AFP) are shown and predicted to be important for classification and overall, from biochemical analysis also provided the same outcome.

Conclusion
In conclusion, the determination of abnormal liver tests requires close attention to the relevant data from case records as well as physical evaluation. It is usually helpful that liver tests are divided into three groups: evaluate synthetic function (Albumin, Total Protein and A/G ratio), evaluate hepatocellular injury or inflammation (ALT and AST), evaluate cholestasis (ALP and Glutamyl transferase). AFP can only be employed if hepatocellular malignancy is suspected. The clinical conditions and the specific pattern of liver disorders will not only minimize various diagnoses but also offer a cost-effective method to evaluate patients and recognize individuals who require a liver biopsy. The models have classified the patients correctly into different groups with 94% and 95% accuracy with respective models (RF and CART) and these results establish and validate the identification of important parameters as features.

Informed Consent
Written informed consent was obtained from the patients who participated in this study.