Comparative Study of Segmentation Methods of Rat Sinusoidal Network to Evaluate Nonalcoholic Steatohepatitis for Small Number of Medical Samples ()
1. Introduction
The liver is one of the most important organs of the body, performing multiple vital functions including metabolism, producing bile, converting nutrients into forms the body can use, and storing the nutrients [1]. The liver sinusoid, a group of small blood vessels with the fenestrated endothelium that permits blood plasma to touch with hepatocytes, maintains these multiple unique functions [1] [2]. However, in diseases such as abnormal hepatocyte function, fibrosis, and cirrhosis, the sinusoid changes and does not function properly. Capturing the morphological changes in the hepatic sinusoidal network is essential [1].
Hematoxylin and eosin (HE) stained liver specimens are among the most commonly used methods for the differential diagnosis of liver diseases. Hepatics sinusoids have been extracted to facilitate the structural analysis of HE images of hepatic tissue using prepared filters for image analysis [3]. Fluorescent images have also been used for pathology section analysis [4]. Fluorescence imaging is a technique to determine the existence of a target component around a bright image spot [4].
For pathology, it is necessary to identify the sinusoidal morphology accurately. To properly extract the sinusoidal morphology, a segmentation that fills in the cavities must be conducted as liver sinusoids are cavities tubes covered by sinusoidal epidermal cells [1]. Numerous processes have been proposed for segmentation over the years, including thresholding binarization, a method based on edge extraction [5] [6] [7]. Ascertaining the segmentation method most appropriate for image determination is crucial.
Many studies have proposed computational methods for image analysis in liver disease detection [5] [7]. The most widely used image recognition is based on supervised learning using heuristically designed models, including convolution neural networks (CNN) [8]. In those methods, features are automatically extracted from multi-layered neural networks, which are then used to perform classification. However, these methods are still limited by being black box models; the chosen features and reasons for these classifications are unclear [5] [6]. Moreover, for medical images, preparing a large number of samples, particularly for rare diseases, is not possible [5]. This makes it necessary to extract and characterize features as well.
This study aimed to find the appropriate segmentation methods and their combination with feature values for properly determining disease. The hepatic sinusoidal network pattern morphology was evaluated in the context of nonalcoholic steatohepatitis (NASH), which is related to metabolic syndrome. Rats were fed a high-fat/high-cholesterol (HFC) diet, causing pathological features similar to those of human patients with NASH [9]. These were used to evaluate the vascular pattern using fluorescent images. Not all the target area’s pixels are bright spots, and the bright area must be colored and stored for accuracy in sorting morphology [10]. Hence, we explored four typical methods of segmentation: morphological transformations (MT) [10] [11], convex hull (CH) [10] [12], contour extraction (CE) [10] [13], and U-net [10] [14]. These methods add color to unpainted areas.
For the feature values for image classification, we compared the fractal dimension (FD) [10] [15] for quantity values, though the feature values are automatically determined for disease identification in deep machine learning. This is because FDs have been applied to quantify many life phenomena, such as patterns of plant morphology structure [15], and coastline geometry [16]. Under such assumption, we performed segmentations of the obtained fluorescent images using the aforementioned four methods and calculated the FDs. The four segmentation methods were compared in terms of discriminative performance. CNN network, a deep machine-learning networks [8] [17], was used to do the comparison of the obtained results.
2. Material & Methods
2.1. Animals
Six-week-old male Wister rats were purchased from the Shimizu Laboratory Supplies Co, Kyoto, Japan. The experiment was performed following the National Research Council’s guidelines for the care and use of mammals and with the approval of the Committee for Animal Research at Kwansei Gakuin University.
The rats were housed in two groups in standard breeding cages (27 × 22 × 12 cm) with freely available food and water under a 12-h light/12-h dark cycle (light in at 08:00). The rats were randomly divided into two groups of three rats each. Control groups (Cont) and HFC groups (HFC) were fed the control and HFC diets for 12 weeks, respectively. This is because why fibrosis begins to occur at 12 weeks, and the morphology changes [9]. For the diets, stroke-prone control chow diet (20.8% crud protein, 4.8% crude liquid, 3.2% crud fiber, 5.0% ash, 8.0%, moisture, and 58.2% carbohydrate) was used as a control diet, and the HFC diet was a mixture of 68% control diet, 25% palm oil, 5% cholesterol, and 2% cholic acid. Both diets were obtained from Funabashi Form (Chiba, Japan).
After 18 - 20 h of removal of food, the rats were sacrificed under pentobarbital (70mg/kg)-induced anesthesia, and the livers were removed. A part of each liver was fixed in 4% buffered paraformaldehyde for histological analysis.
2.2. Immunofluorescent Staining and Observations by Confocal Microscopy
An immunofluorescence technique was applied to 30-μm thick frozen sections of the liver using a monoclonal antibody specific for hepatic sinusoidal endothelial cells (Anti-Rat Hepatic Sinusoidal Endothelial Cells; Immuno-Biological Laboratories Co. Ltd., Japan) [18]. Secondary antibodies were applied for Alexa 488-conjugated rabbit purchased from Molecular Probes (Eugene, OR, USA) were applied. The procedures for immunohistochemical staining followed the method used by Ref [19]. Confocal images were obtained using an Olympus FV 1000 confocal microscope running FluoView version 2.0c software (Olympus, Tokyo, Japan). The images were converted to 200 × 200 pixels in size to apply the image analysis using the Python Pillow library [20].
2.3. Segmentation
Figure 1 shows the schematic of the experimental procedure. Four different segmentation methods were used to delineate the images’ cross-sectional portions of hepatic sinusoids to determine the most effective method to differentiate between HFC and Cont. Each procedure is explained in this section.
2.3.1. Morphological Transformation (MT)
MT demarcates the contour or area of a subject in a binary image. Convolution layers perform convolutions such as erosion and dilation [10] [11]. In this experiment, we applied a method that performs one erosion followed by one dilation as MT processing. A combined erosion and dilation opening transformation was used to reduce the high noise in the image, as shown in Figure 2(b) and Figure 2(g).
2.3.2. Convex Hull (CH)
CH is the smallest convex polygon encompassing everything in a point set [10] [12]. This method is capable of examining and correcting defects in the curve. The convexity of the cross-sectional portion of a blood vessel in the image can be examined and segmented from surrounding points, as shown in Figure 2(c) and Figure 2(h).
Figure 1. Schematic showing of the experimental procedures.
Figure 2. Examples of confocal images of Cont and HFC and their segmented images of hepatic sinusoid by four methods. (a) - (d) and (e) - (h) show representative images from the Cont. and HFC. (a) and (f) are sample images of Cont and HFC. Scale bar 30 um. (b) and (g) are (a) and (f) segmented by the MT method, (c) and (h) are (a) and (f) segmented by the CH method. (d) and (i) are (a) and (f) segmented by the CE method. (d) and (h) are (a) and (f) segmented by the images with U-net.
2.3.3. Contour Extraction (CE)
CE is segmentation by connecting all points of object boundaries in the image [10] [13]. Object boundary is a connection of pixels of the same color in an image. When the target image is binarized, it is possible to extract contours by thresholding to demarcate the object’s contours, as shown in Figure 2(d) and Figure 2(i).
2.3.4. U-Net
The U-net is one of the segmentation convolutional neural networks proposed initially for medical image segmentation by Ronneberger et al. [21]. U-net consists of a contracting path and an expansive path [14] [21]. The contracting path follows the typical architecture of a convolutional network. The repeated application of two 3 × 3 convolutions followed by a rectified linear unit (ReLU) and a 2 × 2 max-pooling operation with stride 2 for down sampling. We set the first layer filter count as 64. At each downsampling step, we doubled the number of feature channels. Every step in the expansive path consists of an upsampling of the feature map followed by a 2 × 2 convolution that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3 × 3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. At the final layer, a 1 × 1 convolution is used to map each 64-component feature vector to the desired number of classes. In total the network has 23 convolutional layers. Segmentation utilizing U-net was performed using the library obtained from Github [22]. The epoch number was set as 100, enough to reduce the training error sufficiently when comparing dice coefficient explained in the next section.
We trained U-net on images with hand-drawn segmentation of cross-sections of hepatic sinusoids on binarized images as training data. Each of the 50 images was segmented using 50 training data samples each, as shown in Figure 2(e) and Figure 2(j).
2.4. Dice Coefficients
The Dice coefficient is a statistical quantity measuring the similarity between two data sets. This index is arguably the most broadly used measure in image segmentation validation [10] [23]. We applied the dice coefficient to compare the similarity between the four types of segmentation images and the extract segmentation images.
The handwritten segmentation images used as training data for the U-net were compared for similarity using the Dice coefficient for the four different segmentation images. The Dice coefficient is calculated using Equation (1).
, (1)
where A is the segmented area sets of hepatic sinusoid and B is the handwritten segmented area sets of hepatic sinusoid. The number of elements in that set, e.g. |X| means the number of elements in set X. “
” represents the intersection of two sets.
2.5. Fractal Analysis
FDs helps measure roughness and self-similarity in objects [10] [15] [16]. Fractal analysis, introduced to the world of research in 1982 by Mandelbrot, has been widely used in image processing [24]. The FD of processed images is calculated using the basic FD equation
, (2)
where N(ε) is the least number of distinct copies of the images in the scale ε. The union of N(ε) distinct copies must cover the images altogether. We applied the box-counting method, a frequently used techniques, to estimate the FD of the image. In the method, the images are delimited into square boxes. Further, the number of boxes in which the feature is to be searched is included. The sizes of these delimited boxes are varied, and the number of boxes is counted each time. Each box containing an image is then plotted on a double-logarithmic graph. The absolute value of the slope is nearly equal to the number representing the FD, D [15] [16]. The box-counting method was performed using ImageJ in Windows [25].
2.6. Statistical Analysis
The obtained FDs were compared using a two-tailed Mann-Whitney U-test (MW test) between Cont and HFC groups. The statistical analyses were performed using the python library [26].
Boxplots are also used when information about distributions is important (see Figures 3-5). Data were plotted as mean ±95% confidence intervals using R software. A boxplot summarizes data using the smallest observation, lower quartile (base of rectangle), median (line in the rectangle), upper quartile (summit of rectangle), and largest observation. Data points considered outliers are marked by isolated points (circles).
2.7. Classification Using CNN
To compare the obtained results, a classification technique, CNN, manly used for image recognition applications, was applied. CNNs are a kind of artificial neural networks that use convolution operations in at least one of their layers [8] [10] [17]. The CNN architecture consists of multiple stages or blocks composed of four main components: a filter bank called kernels, a convolution layer, a non-linearity activation function, and a pooling layer [8] [17]. Each stage aims to represent features as sets of arrays called feature maps. We applied a typical CNN architecture comprising a stack of three 3 × 3 convolutional stages followed with 2 × 2 max-pooling stages and two fully connected layers, giving the final output as a classification module.
In this study, 50, 100, and 200 pieces of supervised image data samples of each group (HFC and Cont) were prepared, and the network was trained. Tests were conducted on ten pieces of the four types of segmented images. To enhance the training data, image rotation of π/2, π, and 2π/3 was prepared following [20].
3. Results
3.1. Dice Coefficients
Figure 2 shows the fluorescent images of Cont and HFC and their segmentation results using each segmentation method, including MT, CH, CE and U-net. The Dice coefficient of 50 pieces of segmentation images by the four methods was calculated. Figure 3 shows the Dice coefficients obtained by the four methods. U-net showed the most similarity, with an average parameter of 0.819. Conversely, the morphology treatment showed the lowest parameter, with an average of 0.2142.
Figure 3. Boxplots of the Dice coefficients of segmentation images by four methods.
Figure 4. Boxplots of FDs of 50 samples of each HFC and Cont. The number of stars indicates the statical level of significance (★: p < 0.05, ★★: p < 0.01). MW test between HFC by segmented by MT (HFC_MT) vs Cont by segmented by MT (Cont_MT) p = 1.00 × 10−7, HFC_CH vs. Cont_CH, p = 1.30 × 10−6, HFC_CE vs. Cont_CE, p = 9.28 × 10−1, HFC_U-net vs. Cont_U-net, p = 2.61 × 10−2.
Figure 5. Boxplots of FDs of HFC and Cont segmented by MT (a), CH (b), and U-net (c) depending on the numbers of samples. The number of stars indicates the statical level of significance (★: p < 0.05, ★★: p < 0.01). MW test between 10 samples of HFC by segmented by MT (10HFC_MT) vs 10 samples of Cont by segmented by MT (10Cont_MT) p = 3.08 × 10−3, 25HFC_MT vs. 25Cont_MT, p = 3.95 × 10−4, 50HFC_MT vs. 50Cont_MT, p = 1.00 × 10−7, 10HFC_CH vs. 10Cont_CH, p = 8.69 × 10−4, 25HFC_CH vs. 25Cont_CH, p = 2.61 × 10−4, and 50HFC_CH vs. 50Cont_CH, p = 1.30 × 10−6, 10HFC_U-net vs. 10Cont_U-net, p = 7.33 × 10−1, 25HFC_U-net vs. 25Cont_U-net, p = 1.09 × 10−1, and 50HFC_CH vs. 50Cont_CH, p = 2.61 × 10−2.
3.2. Fractal Dimensions
Figure 4 shows the boxplots of FDs of the images segmented using the four methods. The FD differs depending on the segmentation technique. Significant difference between Cont and HFC were identified in MT and CH methods.
The FDs of HFC exhibited lower parameters than Cont. It is also known that the numbers of medical image samples is generally limited. Therefore, we investigated the effects of the numbers of image data samples on MT and CH methods. Figure 5 shows the boxplots of the obtained FDs of 10, 25, and 50 samples. The P-values were found to decrease with the sample size in both methods. A significant difference was observed in even ten samples. P-values for the CH method were smaller than those for the MT method for every sample size.
3.3. Comparison of the Classifications Using Convolutional Neural Network
In the previous section, we demonstrated that the MT and CH methods could differentiate between Cont and HFC, even in 10 segmentation samples. To evaluate the methods, we used CNN, a widely used and powerful differentiation method. The most accurate classification was the one that trained and classified images with MT. Figure 6 shows that, as the number of samples increased, the correct responses increased, and when 200 samples of supervised image data were used, the probability of correct classification was 79%.
4. Discussions
In this study, we examined the accuracy of the segmentation methods in differentiating between Cont and HFC. Although the U-net, a machine learning method, had the best results regarding morphological reproducibility, the statistical results suggested that FDs of the Cont and HFC images segmented using MT and CH method could be differentiated.
Figure 6. Change of the classification accuracy by CNN depending on the numbers of MT segmentation samples.
The other advantage of these methods was that significant difference might be observed even in a small number of samples. Although CNNs are known to have outstanding performance in classifications, they need many samples for effective classification, as shown in Figure 6. Therefore, when the number of samples is small, it is better to use biogenic features to classify patterns.
In the present study, we have chosen parameters in networks of U-net and CNN such that the effective detection or classification was performed as in [8] [14] [22]. We have noticed that several conditions often play a crucial role in the classification of segmented images [8] [14]. Therefore, we have partly verified the classification by changing the CNN parameters. For example, we tried CNNs with different activation functions. However, the qualitative results obtained were the same.
One of the most important conclusions of the present study is that it is better to choose a method that extracts more features rather than similarities in the case of a small number of samples. If we assume that U-net performs accurate segmentation and detailed classification is performed by CNN, pattern classification can be performed based on the features accurately found in many images. However, in this study, we selected FDs as the feature quantity, which is an excellent indicator to quantify the natural products [15] [16] [24]. Although less accurate, patterns segmented by MT and CH may have been an excellent indicator to extract more of the inherent fractal nature.
CNN-based pathology testing devices are being developed. However, it seems that with little supervised data, there will inevitably be individual differences [27]. This mechanism may be useful when using a small number of data.
In the future, by increasing the number of samples to analyze, we would like to investigate the bifurcation point when a skillful combination of machine learning is more advantageous for classification than applying prepared features, as we have investigated in this study. We believe this would provide a more detailed glimpse into machine learning’s superiority limitation.
Acknowledgements
We thank to Prof. K. Osaki and Prof. T. Morimoto for their helpful comments.