Analysis of the Relationship between Image and Blood Examinations in an Artificial Intelligence System for the Molecular Diagnosis of Breast Cancer

Molecular subtype classification based on tumor genotype has recently been used for differential diagnosis of breast cancer. The shift from conventional tissue classification to molecular genetics-based classification is primarily because objective genetic information can ensure a biologically clear classification system and patient groups may be created for a given set of diagnoses and suitable treatments. Given the stressful nature of biopsy, radiomic studies are conducted to determine breast cancer subtypes using non-invasive imaging tests. Minimally invasive blood tests using microRNAs (miRNAs) contained in exosomes have been developed. We investigated the usefulness of radiomic features and miRNAs in distinguishing triple-negative breast cancer (TNBC) from other cancer types. Fat suppression T2-weighted magnetic resonance images and miRNAs of 60 cases (9 TNBC and 51 others) were re-trieved from the Cancer Genome Atlas Breast Invasive Carcinoma. Six radiomic features and six miRNAs were selected by least absolute shrinkage and selection operator. Linear discriminant analysis was employed to distinguish between TNBC and others. With miRNAs, TNBC and others were completely separated, whereas with radiomic features, TNBC overlapped with other types of breast cancer. Receiver operating characteristic curve analysis results showed that the area under the curve of radiomic features and miRNAs was 0.85 and 1.0, respectively. miRNAs showed a higher discrimination performance than radiomic features. Although gene analysis is expensive and


Introduction
Medical treatment for cancer is performed in the following order: detection of the lesion, differential diagnosis, and treatment. Research on computer-aided diagnosis (CAD) has led to the development of techniques that detect lesions in medical images and distinguish between benign and malignant lesions [1] [2] [3]. In contrast, radiomics analyzes the relationship between imaging phenotype and genotype of lesions. Radiomics differs from the CAD research in that it supports the medical process after the detection of lesions. Therefore, CAD can be classified as an artificial intelligence (AI) system that supports the first half of medical care, and radiomics is an AI system that supports the second half of medical care.
With the progress in post-genome research, the molecular and genetic backgrounds of various cancers have been clarified. This knowledge not only facilitated molecular classification but also aided in the development of molecular-targeted drugs. Molecular diagnosis of cancer using genetic information enables a clear biological classification, whereas the molecular classification method remains directly associated with the selection of appropriate molecular-targeted drugs. However, for the molecular diagnosis of cancer, tumor cells need to be collected via biopsy, which imposes a significant burden on the patient. Additional constraints include limited availability of facilities for performing gene analysis and the high cost of gene analysis. Therefore, the possibility to easily determine the tumor genotype from non-invasive imaging using radiomics would be advantageous.
A minimally invasive examination using liquid biopsy, such as cell-free DNA, circulating tumor cells, and exosomes, has been performed. In particular, it has been reported that microRNA (miRNA) contained in exosomes derived from cancer cells can be used to detect the presence of cancer with high accuracy [4] [5] [6] [7]. Further, information on tumor genotype can also be obtained. Given the growing demand for molecular diagnosis techniques, studies to clarify the relationship between imaging and genetic examinations are considered important.
Radiomic studies on breast cancer have estimated breast cancer subtypes from various imaging tests [8]- [19] and have predicted the prognosis or recurrence [20] [21] [22]. Of the different subtypes, triple-negative breast cancer (TNBC) accounts for approximately 20% of all breast cancers. TNBC has a very high recurrence rate within 3 years and a shorter survival time after recurrence than that with other breast cancer types. Furthermore, because only anticancer drugs are expected to have therapeutic effects, it is important to distinguish between TNBC and other types of breast cancer [8] [9]. The main contribution of this study is to evaluate the usefulness of radiomic features and miRNAs in distinguishing TNBC from other breast cancer types, in order to construct an AI system that considers the division of roles between genetic testing and imaging tests. If radiomic features and miRNAs have an inclusive relationship, AI supporting second half of the medical care can be realized using either imaging or blood tests.

Imaging and Clinical Data
In this study, we used the Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) database in the Cancer Imaging Archive [23]. TCGA-BRCA contains data from 139 patients with breast cancer. However, magnetic resonance (MR) images and genetic information were not available for all cases.
Therefore, we selected 60 cases for which fat-suppressed contrast-enhanced T1-weighed images and miRNAs were available. The public database also includes information on whether the hormone receptor was positive or negative, human epidermal growth factor receptor 2 was positive or negative, and Ki67 was high or low. Based on the information, 60 cases were classified into two groups: TNBC (9 cases) and others (51 cases) ( Table 1). The study protocol was approved by the Ethics Review Committee.

Gene Data
Each miRNA comprises 10 -100 base sequences. Because miRNAs contained in exosomes are used in liquid biopsy, miRNA was adapted as the genetic information for this study. From the TCGA-BRCA database, we obtained miRNAs, which were taken from tumor cells. The obtained miRNA were used by adding read per million corrections to the read count. This correction is performed when comparing samples and is the number of counts divided by the total number of reads and multiplied by a constant. There were 1325 miRNAs, but most of the data contained zero elements. Therefore, 255 miRNAs with all non-zero elements in the 60 selected cases were used for this study.

Radiomic Features
The slice with largest tumor diameter was selected from the MR images. The MR image was converted to 512 × 512 pixels using linear interpolation. As per the rules for tumor region marking, when there were multiple tumors in an MR image, the one with largest tumor area was selected. When there were spicules and incorrect edges, they were marked as the tumor region to accurately quantify the radiomic features related to shape. An example of tumor region marking is shown in Figure 1.
To normalize the pixel value, we performed a linear density transformation on all MR images. Because the MR images had noise with extremely high pixel values, when linear density transformation was applied, the maximum pixel value after the transformation was affected by noise. To solve this problem, we calculated the upper 0.05% pixel value of the density histogram and set the pixel value above that pixel value as 1023 and then performed linear density transformation so that the minimum and maximum pixel values were 0 and 1023, respectively. Here, we assumed that noise existed in 0.05% of the entire image and thereafter empirically determined the value.
We calculated 298 radiomic features from the tumor region of the MR image after linear density transformation. Free software MaZda [24] [25] [26] was used to calculate the radiomic features. These features comprise 1 size feature, 9 histogram features, 272 texture features, and 16 resolution features. The default values of MaZda were adopted as parameters for calculating these radiomic features. For example, the parameters when calculating the density co-occurrence matrix of texture features were 16 density gradations; 1 -5 in distance between pixels; and 0˚, 45˚, 90˚, and 135˚ in direction.

Selection of Radiomic Features and miRNAs
The numbers of radiomic features (298) and miRNAs (255) were greater than the number of cases (60). Hence, selection of useful radiomic features and miRNAs is necessary to distinguish between TNBC and other cancer types. In this study, radiomic features and miRNAs were selected using the least absolute shrinkage and selection operator (LASSO) [27], which is obtained by the following equation: By switching the input data to radiomic features or miRNAs, radiomic features and miRNAs were selected separately. Here, y i is TNBC or others of the i th patient. x j indicates radiomic feature or miRNA. β j are coefficients and β 0 is a constant term. λ ≥ 0 is a complexity parameter that controls the degree of reduction. p represents the total number of radiomic features and miRNAs. The parameter β j can be obtained by solving the quadratic programming problem in Equation (1). In this study, λ was set in such a manner that the number of radiomic features or miRNAs with a non-zero coefficient β j was 6. Three-fold cross validation was performed to determine the value of λ that minimizes the average deviation. When the values of λ obtained in the process of this calculation were used in order, the value of λ was adopted so that the number of radiomic features or miRNAs with non-zero coefficients was 6. At this instance, depending on the input data, six features could not be selected, whereas five or seven features could be selected.

Visualization by Multidimensional Scaling (MDS)
Although LASSO can reduce the dimension of radiomic features or miRNAs, they are still multidimensional data. Thus, it is not easy to understand the relationship between these multidimensional data and breast cancer subtypes. If the number of dimensions can be reduced to two, the relationship can be visualized because it can be displayed as a scatter plot. Therefore, in this study, we used MDS [27] to reduce the radiomic features or miRNAs to two dimensions. MDS is also called principal coordinate analysis, and a new axis is constructed using the following procedure. First, the distance matrix d ij comprising the Euclidean distance of input i and input j was calculated, and the transformation matrix z ij is then obtained, which is defined by The transformation matrix is used to move the origin to the center of gravity for n input data. Finally, the new coordinate points were determined as the coordinate values on the axis given by the eigenvector of matrix z ij . Because MDS is a linear transformation that maintains the Euclidean distance between data, it can be interpreted by reproducing the relative positional relationship of multidimensional data in a low-dimensional space.

Differentiation between TNBC and Others
We employed linear discriminant analysis (LDA) [28] to distinguish between TNBC and other cancer types. Six radiomic features or six miRNAs selected by LASSO were used as input data for LDA. LDA determines the hyperplane that best discriminates the two groups of TNBC and others, assuming the variances of each group of TNBC and others are the same in the feature space. The hyperplane is defined as follows: Here, z is the discrimination score, x i are the radiomic features or miRNAs, a i are the coefficients, and a 0 is a constant value. A high discrimination performance with LDA indicates that radiomic features or miRNAs can be used as biomarkers to discriminate TNBC and others as LDA input/output is a simple relational expression. The leave-one-out method has been used to learn and test LDA [28]. The discrimination performance was evaluated by the area under the curve (AUC) of receiver operating characteristic (ROC) curve analysis. The LABROC4 algorithm [29], developed at the University of Chicago, was used for ROC curve analysis.

Experimental Results
The six miRNAs and six radiomic features selected by LASSO are listed in Table  2. A scatter plot projecting these features into two dimensions using MDS is shown in Figure 2. When miRNA was used, TNBC and others were completely separated. However, when radiomic features were used, TNBC overlapped with other types of breast cancer. Results of LDA when the number of radiomic features or miRNAs was changed are demonstrated in Figure 3. In this figure, three radiomic features were not selected by LASSO, and the number next to 2 was 4. With three miRNAs, the highest AUC value was 1.0, whereas with six radiomic features, the highest AUC value was 0.881. These results indicated that miRNAs have a higher discrimination performance than radiomic features. Results of plotted output values of LDA with three miRNAs and six radiomic features on the horizontal and vertical axes, respectively, are demonstrated in Figure 4. When miRNAs on the horizontal axis were used, TNBC and others could be completely separated. However, when radiomic features on the vertical axis were used, a significant overlap was observed. When the output values of miRNA and   radiomic features were integrated, the discrimination boundary could be generated in the diagonal direction; thus, the separation between TNBC and others tended to be larger.

Discussion
In this study, miRNA was identified to be a more potent factor than radiomic features in distinguishing TNBC from other cancer types. Therefore, if exosomes derived from breast cancer cells are isolated and miRNA contained in the exosomes are analyzed, TNBC can be detected with high accuracy via liquid biopsy. If the data on genetic properties of breast cancer can be obtained by a mini-mally invasive blood test, the superiority of radiomics, which can easily estimate the genotype of cancer by a non-invasive imaging test, would be compromised. However, it is difficult to obtain information on the anatomical location and extent of the lesion using genetic testing. Hence, in addition to genetic testing, it is important to study the radiomic features and search for measures to integrate them to improve accuracy. Herein, the integrated analysis of miRNA and radiomic features improved the discrimination performance ( Figure 4). This study aimed to discriminate between TNBC and other cancers, which are classified based on the genetic nature of breast cancer. Notably, genetic testing is conducted under conditions that are more favorable than those for imaging tests. Studies have reported the use of predict PCR using radiomic features after classifying breast cancer into subtypes by genetic testing [30] [31] [32] [33] [34]. These studies established the division of roles between genetic testing and imaging. One by one clarification is warranted to determine the part of medical care to which the concepts of radiomics and liquid biopsy can be applied to realize an AI system that supports personalized medicine.
The present study has certain limitations owing to the small number of cases included as the experiment was conducted using a public database. Another limitation is that we could not compare blood tests and imaging tests directly as miRNAs obtained from exosomes derived from breast cancer cells in the blood were not used. Future studies are warranted to address these concerns.

Conclusion
The study identified miRNA as a more potent factor than radiomic features in distinguishing TNBC from other cancers. However, because it is difficult to obtain information on the anatomical location and extent of the lesion by genetic testing, it is important to clarify the radiomic features that are complementary to the genetic data. Research in this regard is believed to be important for constructing an AI system that considers the division of roles between genetic testing and imaging tests in the near future.