Unified Analysis Specific to the Medical Field in the Interpretation of Medical Images through the Use of Deep Learning

Deep learning (DL) has seen an exponential development in recent years, with major impact in many medical fields, especially in the field of medical image. The purpose of the work converges in determining the importance of each component, describing the specificity and correlations of these elements involved in achieving the precision of interpretation of medical images using DL. The major contribution of this work is primarily to the updated charac-terisation of the characteristics of the constituent elements of the deep learning process, scientific data, methods of knowledge incorporation, DL models according to the objectives for which they were designed and the presentation of medical applications in accordance with these tasks. Secondly, it describes the specific correlations between the quality, type and volume of data, the deep learning patterns used in the interpretation of diagnostic medical images and their applications in medicine. Finally presents problems and directions of future research. Data quality and volume, annotations


Introduction
The medical data most used in medical practice are medical images and for this reason most deep learning algorithms have targeted this category of medical information for the realization of medical applications.
This paper presents a methodical review of the literature [1] with the objective of carrying out an analysis of the importance of the relationship between the types and characteristics of scientific data and their use of deep learning models in the interpretation of medical images. We have defined a methodology for semiautomating the production of relevant articles and eliminating those with low impact in the scientific community, by applying inclusive and exclusive quality criteria in the fields of medicine and information technology [2]. The major contribution of this work lies primarily in the updated characterization of the characteristics of the constituent elements of the process of deep learning from data to applications in medicine. Secondly, it describes the specific correlations between data, deep learning models used in the interpretation of diagnostic medical images and their applications in medicine. Finally presents problems and future research directions [3].
The uniqueness of the work is defined by the description of all the constituent elements, namely: data, identification and extraction of automatic standardization of specific medical terms, representation of medical knowledge, incorporation of medical knowledge labeling, description of deep learning (DL) architectures in relation to the objectives for which they were created and in correlation with the other constituent elements of the DL process, presentation of the applications for which they were constituted. Problems in the analysis of the medical image can be classified as follows: identification and extraction and automatic standardization of specific medical terms; representation of medical knowledge; incorporation of medical knowledge. Problems in medical image analysis are related to the following aspects: medical images provided as data for deep learning models require: quality, volume, specificity, labelling; the provision of data from doctors, descriptive data, labels are ambiguous for the same medical and nonstandard references; laborious time in data processing are problems to solve in the future; lack of clinical trials demonstrating the benefits of using DL medical applications in reducing morbidity and mortality and improving patient quality of life [4].
In this paper, we aim to achieve an updated characterization of the specifics of the constituent elements of the deep learning process, scientific data, methods of incorporation of knowledge, DL models according to the objectives for which they were designed and presentation of medical applications according to these tasks. Secondly, we will describe the specific correlations between the quality, type and volume of data and their importance in achieving the performance of the deep learning models used in the interpretation of medical diagnostic images [3]. We will also make a structural and functional description of DL models and their applications in medicine.
A large number of medical images are stored in open access databases have private databases of some ceding institutions. These medical images are filed in connection with imaging reports or medical video image reports and, along with language processing from natural images, they have a great contribution to image analysis [5]. Annotation and labelling of the medical image, representing data from doctors, used through methods of integration into deep learning models, consumes time and requires specialized knowledge [3].
The large volume of training data and properly labeled determines the performance of the deep learning modeling in the interpretation of medical images [3]. Because manual image labelling requires time and specialized training, standardized, organized labelling has been used which has the risk of over-labeling with unnecessary information [2].
In the absence of a large amount of data, the problem of over-assembly can be eliminated by adding abandonment. The deep learning model can have increased preformation in these conditions by optimizing a large number of hyper-parameters (size and number of filters, depth, learning rate, activation function, number of hidden layers, etc.) [1] [6].
In medical image analysis the data types have a high variability and can be exemplified by image captures from different regions [7], different types of data included in a phase [8], different types of images [9], data from doctors have errors and require time for processing [10] small sample sizes [11].
A large number of medical images are stored in open access databases have private databases of some ceding institutions. These medical images are filed in connection with imaging reports or medical video image reports and, along with language processing from natural images, they have a great contribution to image analysis [12]. Annotation and labelling of the medical image, representing data from doctors, used through methods of integration into deep learning models, consumes time and requires specialized knowledge.
The large volume of training data and properly labeled determines the performance of the deep learning modeling in the interpretation of medical images.
Because manual image labelling requires time and specialized training, standardized, organized labelling has been used which has the risk of over-labeling with unnecessary information [6].
In the absence of a large amount of data, the problem of over-assembly can be eliminated by adding abandonment. The deep learning model can have increased preformation in these conditions by optimizing a large number of hyper-parameters (size and number of filters, depth, learning rate, activation function, number of hidden layers, etc.) [1] [13]. In medical image analysis the data types have a high variability and can be exemplified by image captures from different regions [7], different types of data included in a phase [14], different types of images [9], data from doctors have errors and require time for processing [10], small sample sizes [15].
Computer-assisted diagnostics (CAD) in medical imaging and diagnostic radiology through the use of deep learning architectures has progressed to satisfactory results with multiple applications, namely, early detection and diagnosis of breast cancer, lung cancer, glaucoma and skin cancer [3] [16] [17] [18].
The types of images used in the analysis of medical images are: CT, MRI, Xray, Ultra-sound, PET, Wave images, Biopsy, Mammography and Spectrography [1]. In the process of images analysis of the tasks of extracting characteristics, reducing size, augmentation, segmentation, grouping or classification are decisive for the efficiency and precision of integration methods [5] [14] [19] [20].
Larger datasets, compared to the small size of many medical datasets, result in better deep learning models [3] [21].
There are many large-scale and well-annotated data sets, such as ImageNet 1 The knowledge of experienced clinical-imaging physicians (radiologists, ophthalmologists and dermatologists, etc.) follows certain characteristics in images, namely, contrast, color, appearance, topology, shape, edges, etc., help and are used by deep learning models to perform the main tasks of medical image analysis [3].
The type and volume of medical data, the labels, the category of field knowledge and the methods of their integration into the DL architectures implicitly determine their performance in medical applications.

State of Arts
The current state of performance of deep learning models and architectures (DL) depends on the nature and quality of the data used in their training. This section shows the data types and DL model description and classification according to medical data types used, objectives and performances in medical applications.

Scientific Data and Dataset
We will further expose, the types of images and medical data used for diagnosis: natural images, medical images, High-level medical data (diagnostic pattern), low-level medical data (areas of images, disease characteristics), manual features used for medical image analysis.  Figure 1. Types of medical images and datasets. Acronyms: MRI Magnetic Resonance Images, CT Computed Tomography, SLO Scanning Laser Ophthalmoscopy images, The alzheimer's disease neuroimaging initiative (ADNI), Automated cardiac diagnosis challenge (ACDC), The autism brain imaging data exchange (ABIDE), Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases (Chestx-ray14), The lung image database consortium (lidc) and image database resource initiative (idri) (LIDC-IDRI), Algorithms for automatic detection of pulmonary nodules in computed tomography images (LUNA16), Large dataset for abnormality detection in musculoskeletal radiographs (MURA) [3], Machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge (BraTS2018) [3], Locating blood vessels in retinal images (STARE), Digital database for screening mammography (DDSM), Automated mining of large-scale lesion annotations and universal lesion detection with deep learning (DeepLesion), Cardiac Magnetic Resonance Images (Cardiac MRI), International skin imaging collaboration (ISIC). tagged in 20 k categories) and COCO 2 (with over 200 images annotated in 80 categories). Large natural images (ImageNet) are incorporated for the detection of objects in the medical field and are used in applications for the detection of lymph nodes [22], detection of polyp and pulmonary embolism [23], detection of breast tumors [24], detection of colorectal polyps [25] [26]. Natural Images, ImageNet, PASCAL VOC "static data" set, Sports-1M video datasets, which is the largest video classification indicator with 1.1 million sports videos in 487 categories [3] [27].
Medical images from external medical datasets of the same diseases in similar ways (e. g. SFM and DM) [28], medical images from external medical datasets of the same diseases [3] in different ways (DBT and MM, ultrasound) [29] or from different diseases [30]. Medical images are used in multiple applications. Multi-modal medical images, PET images are incorporated for the detection of lesions in CT scans of the liver [31]. Multimodal medical images are also used in another model in the detection of liver tumors [32]. Multimodal medical images (mammographic data) are used to detect breast masses [33]. Medical images, (CT, MRI, angio-CT, butt eye images), annotated retinal images, used to help segment the heart vessel without annotations [3] [34]. External medical data and images of other diseases, such as the union dataset (3DSeg-8) by aggregating eight sets [3] of 3D medical segmentation data [35].
Medical data from doctors: high-level medical data (diagnostic pattern) and low-level medical data (areas of images, disease characteristics). High-level and low-level medical data, i.e. anatomical aspects of the image, shape, position, typology of lesions integrated into segmentation tasks, example of the ISBI 2017 dataset used in skin injury segmentation. The use of additional medical datasets in different ways has also proven to be useful, although most applications are limited in using MRI to help segmentation tasks in CT images [3] [36]. Specific data identified by doctors (attention maps, hand-highlighted features) increase the diagnostic performance of deep learning networks (no comparative studies have been conducted). Medical data from doctors, handmade features, handcrafted features, invariant LBP, as well as H & Components, are calculated first from the images [3]. The use of the BRATS2015 data set in applications in which these features are used is achieved performance in image segmentation by input-level fusion. However, anatomical priorities are only suitable for segmentation of fixed-shaped organs [3] such as the heart or lungs [35].
Manual features used for medical image analysis is a series of measurements (X-ray projections in CT or spatial frequency information in MRI). The methods based on deep learning have been widely applied in this area [37] [38]. Examples: image reconstruction with optical diffuse tomography (DOT), reconstruction of magnetic resonance imaging by compressed detection (CS-MRI) [39], reconstruction of the image with diffuse optical tomography (DOT) of limitedangle breast cancer and limited sources in a strong scattering environment [40], recovery of brain MRI images, target contrast using GAN. Content-based image recovery (CBIR) can be great help to for the clinicians to navigate these large data sets. Some deep learning methods [3] adopt transfer learning to use knowledge from natural images or external medical datasets [41] [42] [43], for example, metadata such as age and sex of patients, characteristics extracted from health areas, decision values of binary traits and texture traits in the process of thoracic X-ray recovery [3].
Medical data used to generate medical reports, subtitling medical images, templates from radiologist reports, visual characteristics of medical images, generating reports using the IU-RR dataset.

Addressing Label Noise in the Formation of Deep Learning Patterns in Medical Image Analysis
The noise of the label in the formation of deep learning models is important in their performance for medical image analysis. The approach of the label noise was achieved by: cleaning and pre-processing labels, improving the network architecture with noise layer, the endowment of networks with loss functions, data re-weighting, data and label consistency, training procedures.
Cleaning and pre-processing labels In chest X-ray scans in the classification of thoracic diseases, the smoothing of labels was used to handle noisy labels and led to improvements of up to 0.08 in the area below the characteristic receptor operating curve (ASC) [44].

Network Architectures
In the case of network architectures, the noise layer proposed by [45] improved the accuracy in detecting breast lesions in mammograms.

Loss functions
The enhancement of networks with loss functions that cause annotations to dilate with a small and large structuring element to generate noisy masks for the foreground and background, e.g. parts of the ring union image were marked as unsafe regions that were ignored during training [46].

Re-weighting data
The method of re-weighting data to cope with noisy annotations in cancer detection was achieved by training models on a large group of noisy label patches using calculated features from a small set of clean label patches and increased model performance by 10%. [47]. This strategy was used to classify skin lesions in noisy label images [48], for segmentation of the heart, clavicles and lung in chest X-rays [10], for segmenting the skin lesion from highly inaccurate annotations [49] proposed a specific characteristic of pixels.

Consistency of data and labels
For segmentation of the left atrium in THE MRI from tagged and unlabeled data it was proposed to form two separate models: a teacher model that produced noisy labels and labeled maps with non-certainties on unlabeled images and a student model that was trained using the noisy labels generated, while T. F. Ursuleanu et al. taking into account the uncertainty of the label and making correct predictions on the clean data set in accordance with the teacher's model on the label, with uncertainty below the threshold. Training procedures For segmentation of the bladder, prostate and rectum in MRI, a model was trained on a clean label data set and used it to predict segmentation masks for a separate set of unlabeled data, and a second model was instructed to estimate a confidence map to indicate regions where predicted labels were more likely to be accurate and reliable paper used to sample the main model with a 3% improvement in the Dice similarity coefficient (DSC) [50]. A rather similar method has been used to classify aortic valve defects in MRI [51].

DL Model Description and Classification According to Medical Data Types Used, Objectives and Performances in Medical Applications
We will synthesize in Figure 2 classification of DL models according to the characteristics and tasks for which they were designed, classification of DL models according to the characteristics and tasks for which they were designed. DL architectures can be divided into three categories:   [59].
CNN contains convolutive layers, grouping layers, dropout layers, and an output layer, hierarchically positioned that each learn stun specific characteristics in the image [14].
CNN in image analysis has low performance when high-resolution datasets are considered [60] and when localization over large patches is required, especially in medical images [61]. Image analysis performance is enhanced by the use of the following architectures: AlexNet, VGGNet and ResNet, YOLO or U-net that we describe below: AlexNet was proposed by [58] [59] for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 [4].
AlexNet consists of 8 layers, 5 layers of convolution and 3 dense, fully connected layers, overlapping overlay, abandonment, data augmentation, ReLU activations after each convolutive layer and fully connected, SGD with impulse [1] [62]. AlexNet is used for image recognition in image analysis and is usually applied to issues involving semantic segmentation and high-resolution data classification tasks [63] [64]. They were used in speech recognition [84], path prediction [85] and medical diagnosis [86], in which the authors proposed an LSTM network, called DeepCare, combining different types of data to identify clinical diseases.
GURs (recurring unit gated) created by [87] [88] solve the problem of increasing the time complexity of LSTM, when large amounts of data are used [4].
The GRU consists of a reset gate in which it is decided how much information from the past is transmitted in the future, and an update gate that decides how much information from the past can be forgotten. GRU and LSTMs have similar applications especially in speech recognition [89]. The two-way recurring neural network and the Boltzmann BRNNs [4] intro-duced by [90] [91] are characterized by the fact that the hidden state is updated by using past information, as in a classic RNN, and by using information related to future moments [4]. They were applied in handwriting and speech recognition, where they are used to detect missing parts of a sentence in a knowledge of the other words [92] [93].
BM models, introduced by [94] [95], are a family of RNNs that are easy to implement and that reproduce many probability distributions, BMs are used in image classification [4]. BMs combined with other models are used to locate objects, [96] [97]. In the classification of images, BMs are used to identify the presence of a tumor [98]. BM models are slow and ineffective when the data size increases exponentially due to the complete connection between neurons [99]. , natural language processing [104] and video analysis [105].
Additional variants of AE that can be found in the literature are variational AE (VAE). In a VAE, the encoder is represented by the probability density function of the input into the feature space and, after the encoding stage, a sampling of the new data using the PDF is added. Differently from the DAE and the SAE, a VAE is not a regularized AE, but is part of the generation class [4].
GAN it is used to generate synthetic training data from original data using latent distribution [1] [106]. It consisted of two networks, a generator estimates false data from input data, and a discriminator, which differentiates fake data from real data and separates it in order to increase the quality of the data generated. GAN has two problems: the problem of the collapse of the mode, and the fact that, can become very unstable.
The DBN (Deep Network of Beliefs), created by Hinton [107], consists of two networks that build each other: of beliefs represented by an acyclic graph composed of layers of stochastic binary units with weighted and respectively weighted connections, restricted Boltzmann Machines which is a stochastic [1]. DBNs are applied in image recognition and speech recognition, in classification to detect lesions in medical diagnosis and, in video recognition to identify the presence of persons [108], in speech recognition to understand missing words in a sentence [109] and in application on physiological signals to recognize human emotion [110].
DTN contains a characteristic extraction layer, which teaches a shared feature subspace in which marginal source distributions and target samples are drawn close and a layer [1] of discrimination that match conditional distributions by classified transduction [111]. It is used for large-scale problems [1].
TDSN contains two parallel hidden representations that are combined using a bilinear mapping [1] [112]. This arrangement provides better generalization compared to the architecture of a single module. The prejudices of the generalisers with regard to the learning set shall be inferred. It works effectively and better than an eco-validation strategy when used with multiple generalisers compared to individual generalisers [1].
Deep InfoMax (DIM): Maximizes mutual information between an input and output of a highly flexible convolutive encoder [113] by forming another neural network that maximizes a lower limit on a divergence between the marginal product of encoder input and output. Estimates obtained by another network can be used to maximize the reciprocal information of the features in the input encoder. The memory requirement of the DIM is lower because it requires only encoder not decoder [1].

Combinations of Different DL Models Depending on the Type of
Data Involved in the Problem to Be Solved DL models can be combined in five different ways depending on the type of data involved in the problem to be solved [1] [4]. Of these, three types of HA (hybrid architectures), namely the integrated model, the built-in model and the whole model.
In the integrated model, the output of the convolution layer is transmitted directly as input to other architectures to the residual attention network, the recurrent convolutive neural network (RCNN) and the model of the recurrent residual convolutive neural network (IRRCNN) [114].
In the built-in model (the improved common hybrid CNNBiLSTM), the size reduction model and the classification model perform together, the results of one represent the inputs for the other model. In the model (EJH-CNN-BiLTM), several basic models are combined.

Combinations of Different DL Models to Benefit from the Characteristics of Each Model with Medical Applications Are: CNN + RNN, AE + CNN and GAN + CNN
CNN + RNN are used for the capabilities of the CNN feature extraction model and the RNNs [15]. Because the result of a CNN is a 3D value and an RNN works with 2D-data, a remodeling layer is, associated between CNN and RNN, to convert THE production of CNN into an array [4]. CNN + RNN have been successfully applied in text analysis to identify missing words [115] and image analysis to increase the speed of magnetic resonance image storage [116] [117]. CNN + RNN variants are obtained by replacing the Standard RNN component [4] with an LSTM component [24] [118].
The AE + CNN architecture combines AE as a pre-training model when using data with high noise levels, and a CNN as a feature extractor model [4]. AE + NVs have an application in image analysis to classify noisy medical images [119] and in the reconstruction of medical images [120] [121].
GAN + CNN combines GAN as a pre-workout model to moderate the problem of over-mounting, and a CNN, used as a feature extractor [4]. It has applications in image analysis [11] [122].
The DL architectures applied especially in image analysis are CNN, AE and GAN. NVs preserve the spatial structure of the data, and are used as feature extractors (especially U-Net), AEs reduce the characteristics of complex images in the analysis process, and GANs are pre-training architectures that select input categories to control overfitting.

Applications in Medicine and the Performance of DL Models Depending on the Therapeutic Areas in Which They Were Used
We further highlight the acquisitions in the study of deep learning and its applications in the analysis of the medical image, between 2017 and 2020 [4]. You can easily identify references to image labeling and annotation, developing new deep learning models with increased performance, and new approaches to medical image processing: • diagnosis of cancer by using CNN with different number of layers [123], • studying deep learning optimization methods and applying in the analysis of medical images [124], • development of techniques used for endoscopic navigation [125], • highlighting the importance of data labelling and annotation and knowledge of model performance [126] [127], • perfecting the layer-wise architecture of convolution networks [1], lesson the cost and calculation time for processor training [128], • description of the use of AI and its applications in the analysis [1] of medical images [129], • diagnosis in degenerative disorder using depp learning techniques [130] and, • detection of cancer by processing medical images using the medium change filter technique [131], • classification of cancer using histopathological images and highlighting the rapidity of Theano, superior tensor flow [131], • development of two-channel computational algorithms using DL (segmentation, extraction of characteristics, selection of characteristics and classification and classification, extraction of high-level captures respectively) [132], • malaria detection using a deep neural network (MM-ResNet) [8].
We will exemplify in Table 1 [2] applications in medicine and the performance of DL models depending on types of medical images and the therapeutic areas in which they were used. We included most relevant papers about the most used medical investigations, respectively medical images.

Conclusions
Doctors interpret images descriptively (contour, contrast, appearance, localization, etc.) by using data from different excipients and successive stages in the analysis of medical images. These handcrafted features consume time and do not have a standardized character. Data quality and volume, annotations and labels, identification and automatic extraction of specific medical terms can help deep learning models perform in the tasks of image analysis [3]. Incorporating these features, labels, into DL architectures increases their performance.
High-level domain knowledge is incorporated as input images [3], and low-level domain knowledge is learned using specific network structures [35] and, together with direct networking, low-level domain knowledge information can also be used to design training commands when combined with the easy-to-use training model [3] [133].
DL can be a support in solving complex problems of interpretation of medical images and provides the doctor with support in making medical decisions and time for patient care.

Research Problems
Problems in medical image analysis can be categorized as follows: • identification and automatic extraction and standardization of specific medical terms, • representation of medical knowledge, • incorporation of medical knowledge.
Problems in medical image analysis are related to: • medical images provided as data for deep-street models require: quality, volume, specificity, labelling.
• providing data from doctors, descriptive data, labels are ambiguous for the same medical and non-standard references.
• laborious time in data processing are problems to solve in the future.
• lack of clinical trials demonstrating the benefits of using DL medical applications in reducing morbidity and mortality and improving patient quality of life [4].

Future Challenges
These consist of adapting the domain consisting of transferring data from one domain to another domain by using labels; knowledge graph characterized by the incorporation of multimodal medical data; generating models capable of extracting features unsupervised and easily incorporated into the architecture of DL networks; techniques to search for a particular network architecture according to the defined objectives.
The adaptation of the domain consisted of transferring information from a source domain to a target domain [3], such as adversarial learning [134], and makes it restrict the domain change between source and target [3] domain in input space [135], feature space [136] [137] and output space [138] [139]. It can be used to transfer knowledge about one set of medical data to another [3] [140], even when they have different modes [3] of imaging or belong to different diseases [141] [142]. UDA (unsupervised domain adaptation), which uses medical labels, has demonstrated performance in disease diagnosis and organ segmentation [3] [140] [143] [144] [145].
The knowledge graph has the specifics of incorporating multimodal medical data and achieves performance in medical image analysis [3] and the creation of medical reports [146]. The graphs of medical knowledge describing, the relationship between different types of knowledge, the relationship between different diseases, the relationship between medical datasets and a type of medical data, help deep learning models work [147].
Generating models, GAN and AE are mainly used for segmentation activities. GAN uses MRI datasets to segment CT images [142] [143]. GAN is a type of unsupervised deep learning network used in medical image analysis [3] [167]. AE are used in extracting features, shape priorities in objects such as organs or lesions, completely unsupervised and are easily incorporated into the network formation process [35] [148].
Network Architecture Search Technique (NAS) can automatically identify a specific network architecture in computer tasks [149] and promises that utility and performance in the medical field [150].