Detection of Angioectasias and Haemorrhages Incorporated into a Multi-Class Classification Tool for the GI Tract Anomalies by Using Binary CNNs

The proposed deep learning algorithm will be integrated as a binary classifier under the umbrella of a multi-class classification tool to facilitate the automated detection of non-healthy deformities, anatomical landmarks, pathological findings, other anomalies and normal cases, by examining medical endoscopic images of GI tract. Each binary classifier is trained to detect one specific non-healthy condition. The algorithm analyzed in the present work expands the ability of detection of this tool by classifying GI tract image snapshots into two classes, depicting haemorrhage and non-haemorrhage state. The proposed algorithm


Computer Aided Diagnosis Based on Expert Systems, ML, AI
The last decade, artificial intelligence (AI) has gone through rapid development, it seems that AI could provide opportunities for better health care, so it is a challenge to apply AI algorithms in order to create CAD tools for automated detection of anomalies and non-healthy deformities in medical images (MRI, CT, endoscopy, etc.) and assisted tools for disease diagnosis. The automated medical decision-making on non-predefined criteria is focusing on the advantage of AI systems that can avoid the typical fallacies of human psychology (overconfidence, loss aversion, anchoring, confirmation bias, representativeness heuristics, etc.), and the widespread human inability to process statistical data. However, we must have in mind that algorithmic decisions may also be mistaken or discriminatory, reproducing human biases and introducing new ones. Since 2018 the European Union General Data Protection Regulation (GDPR) has required that AI or other systems CAD tools should be able to explain their decisions. The future challenge will consist in finding the best combination between human and automated intelligence, taking into account the capacities and the limitations of both [1].
Deep learning is a specific example of AI. The power of deep learning is that by back-propagation its layers can learn from patterns in the data by discovering implicit features that are not part of human language. CNN algorithms look a great tool to assist endoscopists in identifying images of endoscopic capsule including anomalies of different forms. In early 1950s, the first neural network machine was developed based on ideas developed in the 1940s. The first CNN which is a major form of deep learning, was developed in the late 1970s.
The GARVAN-ES1 expert system is an history example of expert system that since 1984 was introduced into routine use at the Garvan Inst. of Medical Research at St Vinncent's Hospital in Sydney to provide clinical interpretations for reports from a diagnostic laboratory which measures thyroid hormone levels. The program was written in C, to produce reports that are 99% correct, with higher quality product at a lower cost [2].
The recent years, different Machine Learning (ML) and AI techniques are applied to the multi-class endoscopic images that could reliably be used as CAD assisted tools in clinics. Focusing on AI diagnosis of gastrointestinal (GI) diseases, over a period of three consecutive years, the Medico GI challenges, called researchers to design classification algorithms on 16 classes of GI real-world dataset. Algorithms on AI diagnosis of gastrointestinal (GI) are developed and evaluated for the challenges: Medical Multimedia Task at MediaEval 2017, Medico Multimedia Task at MediaEval 2018, and BioMedia ACM MM Grand Challenge 2019 [3]. The goals of these challenges were to evaluate multi class-classification methods for classifying about 16 classes, such as anatomical landmarks (e.g. z-line, pylorus, cecum), as pathological findings (e.g. esophagitis, polyps, ulcerative colitis), as polyp removal cases (e.g. dyed and lifted polyps, dyed resection margins), and normal and regular cases (e.g., normal colon mucosa, stool, instrument etc.) inside the GI tract.
The last years, researchers are exploring Machine Learning (ML) algorithms for binary classification models like logistic regression and SVM that do not support multi-class classification natively and require meta-strategies and researchers that explore CNNs for direct multi classes classifications, or CNNs splitting the classification in binary classifiers. The CNNs "One-versus-Rest" strategy splits a multi-class classification into one binary classification problem per class. As it is mentioned in [4], the "One-versus-All the Rest" approach (also called "One-vs-All"), in which we train C binary classifiers, fc(x), where the data from class c is treated as positive, and the data from all the other classes is treated as negative.
The endoscopic image classification ML is a high interest field of research because it approaches the missed classification of diseases that subsequently ensure effective early disease detection. The ML algorithms can improve the images reviewing from inexperienced examiners (endoscopists, radiographers, etc.) and can reduce the reviewing time in cases such as sequences of images in a video stream.

Diagnosis of Angioectasia and Haemorrhages into Gastrointestinal Tract
Gastrointestinal (GI) disease types of angioectasia and haemorrhages traces refer to any in the gastrointestinal tract, in the upper part: oesophagus, stomach and first part of duodenum and the low part: small intestine to large intestine colon. Statistically the small bowel bleeding traces is only the 5% of all in GI. Abnormal blood vessels, usually veins, is the main cause of bleeding, it is being a common situation in the 30% to 40% of population over the age of 50.
Small bowel capsule endoscopy (SBCE) is a major clinical diagnostic practice, as the next step after a standard gastroscopy for the upper GI endoscopy and a colonoscopy for the lower GI endoscopy has failed to do, because it is able to look at the whole small bowel. Once capsule swallowed, as travelling inside the GI tract, the capsule is capturing minimum two pictures per second (it depends from the model), and is transmitting these images by radiofrequency, to a receiver worn by the patient (except in the case of the capsule from CapsoCam ® where the data are stored within the capsule). The capsule's travel often last from 4 to 8 hours, the capsule's batteries offer a lifespan of 10 to 15 hours, the capsule during its travel takes for a total of about 55,000 images. It is expected that when it will be technologically possible to control the mobility and the power to feed more sensors, robotic arms and actuators integrated into capsule's small volume, the robotic endoscopic capsule will be used therapeutically in the GI tract [5,6].
Until 2021, the way the doctors review the CE video has not changed much since SBCE introduction from 2000. Until now, the evaluation of image sequences aiming the localization and classification of lesions in small intestine area, is based on the personal experience of doctors and lacks of automation based on specific scores to standardize the description of lesions, quantify objectively the lesions in capsule endoscopy reports, with potential prognostic and therapeutic impact. The companies in the Endoscopic Capsule market tends to support their products with proprietary software for the doctors to use in order to automate some aspects of reviewing procedure, mainly to reduce the long review time. This is done by adjusting the number of frames viewed simultaneous (dual view or quad view, as either sequential or overlapping images), adjusting the speed at which frames are presented, and eliminated the similar consecutive snapshots. Moreover, there is the software tool in the RAPID Reader v8.3 program, "suspected blood indicator" (SBI) function which has been developed and evaluated as a computer aided tool for automatic detection of bleeding traces into the GI tract [7]. AI algorithm trained among 281 patients, 10.3% presented with active haemorrhage while 28.9% presented lesions with high bleeding potential (angioectasias, ulcers and tumours), the SBI software tool achieves a 96.6% sensitivity for active small bowel bleeding, with a 97.7% negative predictive value. Regarding high bleeding potential lesions, the SBI displayed an overall sensitivity of 39.5%, being highest for ulcerated neoplasias (100%), but significantly lower for angioectasias (38.5%) or ulcers (20.0%) [8].
It has been shown that as blood can vary in color, the SBI software tool is likely unable to detect all shades of red and so it is a reliable tool to exclude active bleeding and/or major lesions, so the reviewing of the CE video by an expert is still important for the detection of lesions responsible for past bleeding [9,10].
It has been shown that in the cases of patients with few or minor lesions, the diagnosis by reviewing the CE videos by doctors, is dependent from the endoscopist experience [11]. At the International Endoscopic Capsule Conference in 2006, there was a consensus that the fastest projection rate proved to be acceptable was 15 fps [12]. It is suggested that the novice examiner should read the video at 12 -15 fps [13]. Conversely, an experienced examiner can choose a faster rate (20 fps) if Single View mode is used, while selecting Multi View mode can reduce reading time by 30%.
It has been evaluated the effectiveness of the QuickView Rapid ® 5 software: 1) to reduce the video stream reading time, 2) to evaluate the number of false negatives in cases of not well experienced endoscopists, and determine the learning curve for reviewing CE images [14][15][16]. It has been evaluated that the Olympus algorithm called Omni mode, the Omni mode may be appropriate only for detecting major lesions [16]. Moreover, it has been noticed that "Quickview" and "Quadview" modes are still unacceptable because some significant lesions were not detected, with a diagnostic miss rate ranging from 6.5% to 12% [14,16,17].
Concerning the methods that technically maximize color differences in images captured by a capsule, are categorized during the process phase either by decomposing the images using simulated blue filter, or by decomposing the images using a simulated "chromoendoscopy" algorithm. Fuji Intelligent Color Enhancement (FICE ® ) technology decomposes images by wavelengths, and as a result maximizes image differences on patterns, such as vascular and mucosal patterns.
It has been shown that RAPID ® 5 Access improves diagnostic yield in the detection rate of denuded redness, reducing reading time, as much in the "auto mode", as in "displaying a single image at 12 fps". RAPID( ® ) 5 Access improves diagnostic yield, reducing reading time; however, it is still unacceptable because of the diagnostic miss rate and may be useful as an ancillary reading tool [18].
The capsule's images blue filter decomposition versus the FISE decomposition has been compared and evaluated for all lesion categories as follows: 1) that blue filter decomposition provided image improvement (compared to white light raw images) in 83%, 2) that blue filter offers better image enhancement than FISE in capsule endoscopy [19].
Blue filter decomposition has been used as Blue Mode (BM) modality for PillCam ® and it has been achieved detection improvement [20].
During an experiment, comparing the diagnosis results of reviewing the same data by two groups of endoscopists: one group of experts versus one inexperienced, in each group, the results varied depended on the findings, even between experts. In both groups, the accuracies were lower in cases with subtle mucosal lesions, such as erosion, angioectasia, and divertic ulum, and more higher in cases with more prominent intraluminal changes (e.g. active small-bowel bleeding, ulcer, tumor, stenotic lumen) [21]. That means that the use of computer aided diagnosis tools will assist to eliminate the factor of doctor's reviewing experience.
The algorithm proposed by our team, based on pixels statistical analysis features, using colorimetric image filters in the HSV color space extracted of manually pixel-level annotations of regions of interest (ROI) can achieve sensitivity of 99% [22].
About the performance of CNN architecture when pretrained with AlexNet, it is shown that using a dataset including 8200 non-healthy and 40,000 healthy frames (there is no mention from how many different videos or different patients), can be achieved detection of sensitivity 92.2% (there is no information for precision and FPR) [23].
Another proposed methodology is first step to classify the bleeding samples into active and inactive subgroups based on the statistical features derived from the histogram probability of the color space. Then for each subgroup, we highlight the blood regions via fully convolutional networks (FCNs) [24,25].

AIMS
The aim of our team is to develop a multi-class CAD tool for diagnosing different GI tract diseases, consisting of a number of binary CNNs, each one for a different disease. That, because of the benefits of training binary CNNs instead of a multi-class CNN are a quite important (less time intensive, less compu-tational complexity, thus achieving better values for sensitivity and accuracy). The proposed algorithm is to aid inexperienced endoscopists to identify haemorrhage state or non-haemorrhage state into frames of endoscopic videos. Angiectasias are the most common lesions diagnosed in patients with medium GI bleeding. Each one of the CNN binary classifiers belonging to this CAD tool is trained to detect a specific disease. The input data for each binary CNN are the frames of endoscopic videos showing frames of healthy states, and/or frames of different known diseases (such as polyps, angiectasias, etc.). For example, the task of another binary classifier from the same system could be to identify polyps, of another could be to classify the polyps to adenomatous polyps (adenomas) from non-neoplastics (under publication), and so on. Because our proposed CNN models are designed to classify a given image only in two classes, the output layer has two neurons. The last fully connected layer, which is a two-dimensional feature vector, is given as an input to softmax classifier, which makes the final prediction whether there is an "angiectasia" or "not". The frames classified as "not" can be healthy states or frames showing other diseases than "angiectasia".
Moreover, we compare the performance of our present proposed CNN for identifying "angiectasia" or "not", using training data on image level annotation versus our recently published CAD algorithm based on HSV colorimetric lesions features extracted of manually pixel-level annotations of regions of interest (ROI). We trained and tested both algorithms on the same data.

DATA AND METHODS
The proposed CAD algorithm is based on a CNN, trained on dataset annotated on image level (not pixel), on the same dataset that used by the authors, in their published algorithm based on pixels statistical analysis features, using colorimetric image filters in the HSV color space [22]. The data used, are not yet publicly available for the research community, consist of 195 capsule endoscopy procedures from patients that were referred for obscure gastrointestinal bleeding or for suspected Crohn's disease. Abnormalities were found in 177 (90.7%) of cases, including dark hemorrhage and angiodysplasia(s) in 31 cases (15.9%), active bleeding in 8 cases (4.1%)). The completely healthy cases were 18 (9.3%) of cases. All the cases, were treated in five Greek national university hospitals. The videos are anonymized, and the confidentiality protocol applied complies with current European Union legal regulations.
All the videos used are taken by the endoscopic capsule Pillcam TM SB videos (previously known as Given Imaging, Yokneam, Israel), which has the ability to capture 250 × 250 pixels, the exported images are of 576 × 576 pixels through the camera's own extrapolation algorithm (see samples in Figure 1). All the patients are diagnosed by the attending expert doctors of four University Gastroenterology Departments of Medical Schools: 1) of National and Kapodistrian University of Athens, 2) Attikon Hospital and 3) Laikon Hospital of Athens, and the 4) Aristotle University of Thessaloniki. All the video streams were reviewed manually by experts and automated by the capsule's own diagnostic tool SBI TM (Suspected Blood Indicator) of Rapid TM READER and were annotated by surgeons gastroenterologists with more than 35 years experience. The team's expert doctor, in collaboration with the team's data analysis engineer, eliminated images taken from the same area by applying the capsule's own software Quickview TM . The metadata from the manually checked annotation process are: 1) lesion ID, 2) size of lesion in mm, 3) lesion diagnosis, 4) number of lesions per image.
The training and testing datasets are balanced, each one consists of 50% video frames showing conditions of GI haemorrhaging and GI angioectasias, and 50% non-haemorrhaging, see Set1 in Table 1. Frames identified as taken from the same area of lesion or from the same healthy area, are excluded. There is a number of less than ideal video frames (i.e. including foaming and bubbles, processed food, out of focus imagery, highly light reflections, out of bound lighting conditions).
For the training procedure are used 130 videos. In detail, 92 videos showing other diseases than haemorrhage (non-haemorrhage) cases of 92 patients, 26 videos of 26 patients showing haemorrhage cases and 12 videos of complete healthy cases of 12 patients. Frames from the same patient are not used for both sets, there is no overlap in content between the training set and the testing set, see Set 2 in Table 1. Non-haemorrhage (Ulcer(s)) 4 5 Non-haemorrhage (Other non-healthy) 3 3 *Videos come from Gastroenterology Departments of three Greek public hospitals: Aretaieio Hospital, Atticon Hospital, Laiko Hospital, **Small: continuous surface of 800 to 1000 pixels, ***Medium: continuous surface of 1000 to 3000 pixels. The Extensive bleeding areas, defined by us with more than 3000 pixel, they are not considered due that are obviously obvious.
The training dataset includes of 6200 video frames (50% of these frames are showing haemorrhage conditions. Totally, 3100 video frames are showing non-haemorrhage state (healthy and other diseases) are relevant in terms of location in GI tract. All the frames selected manually by expert doctors, in a way to be inclusive of different shapes, dimension, active and past hemorrhage traces, see Table 1). Figure 1 illustrates a sample of our data.
Similarly, in order to control the evaluation procedure, to ensure that precision does not depend on prevalence, the precision is normalized to a prevalence of 50%. Thus, is introduced the ratio of the number of cases in the haemorrhage control group and the number of cases in the non-haemorrhage group, used to establish the Negative Predictive Value (NPV) and the Positive Predictive Value (PPV) or Precision, is equal to the Prevalence of the diseases in the studied population. The testing group includes total 1000 frames, of 500 non-haemorrhage frames (healthy or other diseases than haemorrhage) and of 500 haemorrhage frames, from patients that do not belong in the training dataset. The training dataset, is named Figure 1. Sample of our CE captured frames: 1st row, from left to right: all cases of bleeding and angioectasia. 2nd row, from left to right: three normal cases, the last two angioectasia. 3rd row, from left to right: cases of polyps, the last is normal.
as Set 1, the testing set, named as Set 2, both described in Table 1 To create a new artificially training data from the existing training data Set 1, the augmentation technique applied in balanced subset of the data Set 1 (in the training dataset, and not to the validation or test dataset). Figure 2 illustrates a sample of augmented data.

Preparation of Training Data
The pool of images that feed the CNN model is determined by shuffling the video frames of Set 1 (see Table 1).
It is ensured that the aspect ratio for all the images used is equal to 1:1 and same resolution, the latern ensured by cropping useless pixels around the edges. After this procedure, we end up with 6400 images, of 224 × 224 pixels. The

Creation of Deep Learning Models
The CNNs training run entirely in the cloud. The Keras API Python library run on top of Tensorflow numerical platform in Python. The back propagation is automatically generated by TensorFlow, for optimizer we used Adam with learning rate 0.0001. For Loss is used binary cross entropy. Our methodology to create the CNN architecture, was to add the blocks one at a time in order to see the differences in the results as we going deeper. During training phase, dropout technique applied (to randomly and temporarily dropping out units, hidden and visible, along with their connections, in a neural network). We applied to classifier: Dropout (0.5) and to convolutional base: Dropout (0.2), which means that means that we dropped out randomly the 50% and 20% of the CNN's units. By implementing of 22 CNN variations, we have been able to choose the two best performers in terms of sensitivity and specificity. Similarly the techniques implemented for those two best performers, where data augmentation and Dropout only in the classifier. The architecture of the CNN used data augmentation (named: Model 1_Ag) consists of: three Conv2D layers with filter size (32, 64, 128), three Max Pooling layers, one flatten layer and two fully connected layers with 128 and 2 neurons respectively. The architecture of the CNN used Dropout only in the classifier (named: Model 3_DPcl) consists of: four double Conv2D layers with filter size (32, 64, 128, 256), four Max Pooling layers, one flatten layer, three fully connected layers with 256, 256 and 2 neurons respectively and two Dropout (0.5) layers.
Some of our CNNs variations were trained using Transfer Learning (TL). The pre-trained models used are the VGG16 and RESNET50 trained on ImageNet dataset and fine-tune them on our training data Set1. Our transfer learning is applied as follows: First we trained only the classifier and kept the convolutional base freeze of the transferring model, finally we Fine-Tune the transferring model by unfreezing the part of convolutional base (block 4 and 5 for the VGG16, and the RESNET50). The RESNET50 performs the initial convolution and max-pooling using 7 × 7 and 3 × 3 kernel sizes.
Respectively, all the models trained for 10 epochs except in TL for 5 epochs. The CNN variations where TL was used, their tested performance was identical.
The algorithms testing was carried out under the guidance of the supervising of our team's experienced doctor, at "Aretaieion" Greek University Hospital in Athens.

Evaluation Metrics
For the testing procedure, the indicators used to measure the performance of the proposed algorithms are based on statistical metrics of specificity, sensitivity, precision, False Positive Ratio (FPR), False Negative Ratio (FNR). Equations from (1) to (6) were used for all CNNs.
Accuracy is the probability of correctly recognizing a frame as True Positive (TP) or True Negative (TN) among the total number of cases examined. It is calculated as follows: TP TN Accuracy TP FP TN FN Sensitivity (also called the true positive rate, the recall, or probability of detection in some fields) measures the proportion of actual positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition): Specificity (also called the true negative rate) measures the proportion of actual negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition): Precision measures the proportion of all positive (TP + FP) identifications was actually correct: FPR (False Negative Ratio) or (Type I Error) which is the error of proportion to recognize as bleeding (positives) the images that are actually healthy (negatives). We need that to have very low values. It is calculated as following: FNR (False Negative Ratio) or (Type II Error) is the error of proportion to recognize as healthy (negatives) the images that are actually bleeding (positives). We have to notice that it is very important in medical diagnosis to achieve FNR very low because people that have bleed may not get the proper healthcare because of the results.
For the above equations: TP: Is the state that a bleeding image was correctly predicted as bleeding. TN: Is the state that a non-bleeding image was correctly predicted as non-bleeding. FP: Is the state that a non-bleeding image was falsely predicted as bleeding. FN: Is the state that a bleeding image was falsely predicted as non-bleeding.

RESULTS
The proposed binary classifier based on CNN architecture using data augmentation, identified the binary classification of class haemorrhage (frames showing cases of GI haemorrhaging and GI angioectasias abnormalities) and class of non-haemorrhage (frames not showing hemorrhage traces) with FNR 10%, FPR 8%, sensitivity 90%, specificity 92%, accuracy 91%, precision 91.8%.
These performance metrics are compared to their respective values as published by our previous work in [22] (CAD method based on color HSV colorimetric filters, trained and tested on pixel-level annotations on the same dataset). We evaluated that our best CNN trained on image level annotated images is by 9% less sensitive, by 2.6% less precision, by 1.2% less FPR, and by 7% less FNR.
We verified that the mean reviewing time per patient by the well-experienced doctor was 29 ± 9 min by handcrafted reviewing and by 6 ± 2 min by the proposed method.

CONCLUSIONS
We trained variations of CNN algorithms using image level annotated datasets from endoscopic capsule videos with goal the binary classification, to identify the endoscopic capsule video frames that are showing haemorrhage traces due to GI haemorrhaging and/or angioectasias inside the GI tract. The best performance of our tested CNN variations will be used as a sub-task classifier under the umbrella of our under development CAD tool for different disease in GI tract, among other binary classifiers for other GI diseases (i.e. polyps and ulcers. Our results of the best CNN binary classifier are compared with our recently published CAD method aimed to achieve the same goal, based on feature extraction which requires significantly a more labor intensive approach (manually annotated pixels instead of image level), by creating HSV colorimetric filters. Both methods are controlled over the same data collection, where the training procedure. Our previous proposed method using HSV colorimetric filters provided higher levels of performance metrics than the one currently proposed (CNN CAD) [22]. This can be explained by the fact that the dataset for the manually selected colorimetric features extraction was based on pixel-level and big enough, as compared to the more demanding CNN method of training in terms of absolute number of samples. Both diagnosis methods, proved less accurate than the one exercised by the well-experienced doctor. The proposed CAD diagnosis methods can be an important assisted tool for less experience doctors. Ιn any case, methods that eliminate similar images are necessary to reduce the overview time.
Concerning the technical aspect of the training of a CNN model, our experience can testify for the following: The use of a validation dataset during the training phase is helpful for the train process to en-