^{1}

^{2}

A number of clustering algorithms were used to analyze many databases in the field of image clustering. The main objective of this research work was to perform a comparative analysis of the two of the existing partitions based clustering algorithms and a hybrid clustering algorithm. The results verification done by using classification algorithms via its accuracy. The perfor-mance of clustering and classification algorithms were carried out in this work based on the tumor identification, cluster quality and other parameters like run time and volume complexity. Some of the well known classification algorithms were used to find the accuracy of produced results of the clustering algorithms. The performance of the clustering algorithms proved mean-ingful in many domains, particularly k-Means, FCM. In addition, the proposed multifarious clustering technique has revealed their efficiency in terms of performance in predicting tumor affected regions in mammogram images. The color images are converted in to gray scale images and then it is processed. Finally, it is identified the best method for the analysis of finding tumor in breast images. This research would be immensely useful to physicians and radiologist to identify cancer affected area in the breast.

The breast cancer segmentation process is still a challenging issue in the field of medicine. The problem of breast malignant neoplastic disease is one of the hazardous cancers for women around the world. It has been rated as the second most common disease that causes death in adult females. The highest incidence of breast cancer in women has increased significantly in the last few years. The breast cancer is a malignant tumor that grows in or around the breast tissue, mainly in the milk ducts and glands. A tumor usually starts as a lump or calcium deposit that develops as a result of abnormal cell growth. Most breast lumps are benign but can be premalignant (may become cancer). Breast cancer is classified as either primary or metastatic. The initial malignant tumor that develops within the breast tissue is known as primary breast cancer. Sometimes, primary breast cancer can also be found when it is spread to lymph nodes that are close by in the arm pit. Metastatic breast cancer, or advanced cancer, is formed when cancer cells located in the breast break away and travel to another organ or part of the body [

Mammography is a method that helps in early detection of breast cancer [

The purpose of this research work is to identify the tumor, in and around the breast and find its affected region by partitioning the images into clusters based on its intensity and color contrast. The tumor area has been identified in some suitable stage of the clustering process [

A number of clustering algorithms were used to analyze many databases in the field of image clustering. The main objective of this research work was to perform a comparative analysis of the three partitions based clustering algorithms and verification of the accuracy denoted by classification algorithms. The performance of clustering and classification algorithms were carried out in this work based on the finding of tumor, cluster quality and other parameters like running time and volume complexity. The classification algorithms were used to find the accuracy of produced results of the clustering algorithms. The k-Means, fuzzy C-Means (FCM) and Multifarious clustering algorithms performance proved meaningful in many domains, particularly k-Means, FCM and multifarious clustering technique have revealed their efficiency in terms of performance in predicting tumor affected regions in mammogram images.

This research work uses mammogram images of three types such as normal, benign and malignant. The mammogram images collected from Swamy Vivekananda Diagnostic Centre (SVDC) Hospital in Chennai at D.G. Vaishnav College campus. Symptoms of abnormality in some of the mammogram images were marked by SVDC head. The mammogram breast cancer images in DICOM (Digital Imaging and Communications in Medicine) format were taken for analysis. The DICOM file format supports the encapsulation of object type data. In this research work, the attributes of mammogram images of the patients data like age, gender, modality, study description, date of the image taken, image size and type etc. were considered for analysis. One of the example experimental data is shown in

Many methods used by various researchers for the analysis and findings of breast cancer in mammogram images. This research uses 310 images for the analysis, which includes three types: normal 10, benign 116 and malignant 184 images to find the affected and unaffected images of mammograms. The proposed method has three stages; pre-processing, image segmentation and classification. Image pre-processing techniques are necessary, in order to find the orientation of the mammogram, to remove the noise and to enhance the quality of the image. Since all the images are extracted by using the clustering and classification to find the tumor area and to find the accuracy respectively. MATLAB software was used to write the source code for the entire work in this research. The various methods used for this work are discussed as follows. The steps involved in the proposed Architecture in

Step 1: Convert the mammogram image DICOM format into JPG format.

Step 2: Input the images for preprocessing.

Step 3: Preprocessing the images using median filter, Gaussian method, region of interest, inverse method and boundary detection methods to remove the noise.

Step 4: Apply the k-Means, FCM and multifarious algorithm to find the affected region based on the intensity of the images.

Step 5: Enter the number of clusters.

Step 6: Display the tumor affected area by k-Means, FCM and multifarious algorithm via its output images.

Step 7: Find the number of pixels in each and every output of the k-Means, FCM and multifarious algorithm.

Step 8: The k-Means, FCM and multifarious algorithms run time, volume complexity and clustering quality in comparison of the best algorithms.

Step 9: Input the number of pixel values into the input for classification algorithms.

Step 10: Find the accuracy using the classification algorithms J48, JRIP, SVM, Naive Bayes and CART.

Step 11: Find the performance of classification algorithms based on its accuracy.

The main objective of the preprocessing is to develop the image quality to make it ready for further processing by removing or reducing the non related and surplus parts in the background of the medical images. These methods are separated into following categories data cleaning, data integration, data transformation and data reduction. The steps involve the process are Region of Interest (ROI), Inverse method and boundary detection method [

The main purpose of clustering is to divide a set of objects into significant groups. The clustering of objects is based on measuring of correspondence between the pair of objects using distance function. Thus, result of clustering is a set of clusters, where object within one cluster is further similar to each other, than to object in another cluster. The Cluster analysis has been broadly applied in numerous applications, including segmentation of medical images, information analysis, and image processing. Clustering is also called segmentation in images form some applications The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering. By clustering, one can identify dense and sparse regions and therefore, discover overall distribution patterns and interesting correlations between data attributes. Thus, clustering in measurement space may be an pointer of similarity of image regions, and may be used for segmentation purposes [

The k-Means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. Since, k-mean clustering is normally introduced to group a set of data points { x 1 , x 2 , ... , x n } into k clusters. It has high computational efficiency and can support multidimensional vectors. So it reduces the distortion measurement by minimizing a cost function as:

Where ‖ x i ( j ) − c j ‖ 2 is a chosen distance measure between a data point ( )j i x and the cluster center cj , is an indicator of the distance of the n data points from their respective cluster centers. The algorithm is composed of the following steps:

Step 1: Place k points into the space represented by the objects that are being clustered. These points represent initial group centroids.

Step 2: Assign each object to the group that has the closest centroid.

Step 3: When all objects have been assigned, recalculate the positions of the k centroids.

Step 4: Repeat steps 2 and 3 until the centroids no longer move.

This produces a separation of the objects into groups from which the metric to be minimized can be calculated. The k-means is simple clustering algorithm that has been improved to several problem domains [

This method is widely used in pattern recognition. It is based on minimization of the following objective function. Where m is any real number greater than 1, x i is the i^{th} of d-dimensional measured data, c j is the d-dimension center of the cluster, u i j is the degree of membership of x i in the cluster j, and ‖ ∗ ‖ is any norm expressing the similarity between any measured data and the center. Fuzzy partitioning is carried out through an iterative optimization of the objective function shown above, with the update of membership u i j and the cluster centers , c j . This iteration will stop when, max y { | u i j ( k + 1 ) − u i j ( k ) | } ≤ ξ where ξ is a termination criterion between 0 and 1, whereas k is the iteration steps. This procedure converges to a local minimum or a saddle point of Fm. The algorithm is composed of the following steps:

Step 1: Initialize U = [ u i j ] matrix, U ( 0 ) .

Step 2: At k-step: calculate the centers vectors C ( K ) = [ c j ] with U ( K ) .

Step 3: Update U ( K ) , U ( K + 1 ) U(k),

Step 4: if ‖ U ( K + 1 ) − U ( K ) ‖ < ξ then STOP; otherwise return to step 2 [

A hybrid method called “Multifarious Clustering Algorithm (MCA)” is proposed in this research work, which incorporates the advantages of both k-Means and FCM algorithms. The MCA is a method of clustering which allows one part of data to belong to one or more clusters. This algorithm is newly developed with the combination of k-Means and FCM algorithms for this work. It is based on minimization of an objective function:

where m is any real number greater than 1, W I J is the degree of membership of x_{i} in the cluster j, x_{i} is the i^{th} of z-dimensional measured data, c_{j} is the z-dimension center of the cluster, and ||*|| is any norm expressing the similarity between any measured data and the center. Fuzzy partitioning is carried out through an iterative optimization of the objective function shown above, with the update of membership W I J and the cluster centers Cj by:

This iteration will stop when max i j { | W i j ( k + 1 ) − W i j ( k ) | } < ξ , where ξ is a termination criterion between 0 and 1, whereas k is the iteration steps. This procedure converges to a local minimum or a saddle point of V M The algorithm is composed of the following steps:

Input: x j is the cluster j, C J is the z dimension center, W I J is the degree of membership. Let C J , X j ∈ w i j

Output

w i j = [ arg ] Matrix

While k = x i do

C K = [ C J ] U K ≤ 1 , M I = X I − C J , M K = X I = c k , W I J = M J M K (12)

M A X I J = [ w ( K + 1 ) − W i i j ( k ) ]

Return C K ;

End;

In this algorithm, data are bound to each cluster by means of a membership function, which represents the fuzzy behavior of the algorithm. To answer that, the algorithm has to construct an appropriate matrix named W I J whose factors are numbers between 0 and 1, and represent the degree of membership between data and the cores of clusters. The operation of the three clustering algorithms has been examined on the basis of clustering quality and efficiency of the algorithms.

The experiments carried out in this work divided into three parts via preprocessing, clustering and classification as discussed. Based on this notion, the experimental results are analysed.

The main objective of the preprocessing is to develop the image quality to make it ready for further processing by removing or reducing the not related and surplus parts in the background of the mammogram image's pixels. The research work analysis three type input mammogram breast cancer images. The common characteristics of the breast cancer images like as unknown noise, poor image contrast, weak boundaries and unrelated parts will affect the content of the breast cancer images. In the preprocessing, first the noise can be removed using the median filter method in

The results of Gaussian filter shows in

The main objective of this procedure is to develop the quality of the image, to make it ready for further processing. This process was done by using ROI, inverse method and the boundary detection method respectively. The ROI method finds the areas of images based its intensity. The detection of the ROI consists in finding a region of the image which appears different from the background with respect to low-level features such as contrast, color, region size and shape, distribution of contours or texture pattern. Different methods have been proposed to detect regions of interest in an image. The pixel having highest intensity value in the digital image is chosen, then that pixel is compared to the neighboring pixels. The comparison goes on till there is a modification in the pixel value [

Image No | BP | AP | Difference | Average (BP, AP) | BM (KB) | AM (KB) | Difference (KB) | Average (KB) |
---|---|---|---|---|---|---|---|---|

1 | 4200 | 4086 | 114 | 4143 | 15.8 | 14.8 | 1.0 | 15.3 |

2 | 7130 | 6973 | 157 | 7051.5 | 9.27 | 8.2 | 1.25 | 8.7 |

3 | 9745 | 9619 | 126 | 9682 | 9.13 | 8.12 | 1.01 | 8.6 |

Result of Benign Breast Cancer Images | ||||||||

Image No | BP | AP | Difference | Average (BP, AP) | BM (KB) | AM (KB) | Difference (KB) | Average (KB) |

4 | 13,800 | 13,694 | 106 | 13,747 | 28.0 | 22.01 | 6.04 | 14.025 |

5 | 11,330 | 11,179 | 151 | 11,254.5 | 24.04 | 18.02 | 6.02 | 12.02 |

6 | 13,208 | 13,060 | 148 | 13,134 | 20.06 | 18.02 | 2.04 | 10.03 |

Result of Malignant Breast Cancer Images | ||||||||

Image No | BP | AP | Difference | Average (BP,AP) | BM (KB) | AM (KB) | Difference (KB) | Average (KB) |

7 | 13,748 | 13,548 | 200 | 13,648 | 33.06 | 30.1 | 3.05 | 31.58 |

8 | 24,720 | 24,620 | 100 | 24,670 | 38.0 | 34.1 | 4.0 | 36.05 |

9 | 14,700 | 14,505 | 195 | 14,602.5 | 32.07 | 30.1 | 2.06 | 31.085 |

The segmentation of images by the k-Means, FCM and MCA clustering algorithms were carried out in this work to find the tumor affected regions in the mammogram images. During the process of clustering the images, Figures 6-8 normal images were identified without any abnormal portions in the image. Single color (black color only) is found in the normal images after the clustering process. Figures 6-8 shows the result of benign and the malignant images, the white color pixels were identified by both k-Means and FCM algorithms in the 5^{th} cluster and by MCA algorithm in the 4^{th} cluster itself. The performance of three algorithms were measured by using the parameters Tables 2-5 like run time, volume complexity and clustering quality.

Therefore, it is evident that the time taken for analyzing the images by MCA was less than the k-Means and FCM algorithms.

The proposed MCA produced better results in the clustering process with high performance, less execution time and occupied less space. For verification of the results, this work used classification algorithms such as CART, J48, JRIP, SVM and Naive Bayes.

Image No | C1 | C2 | C3 | C4 | C5 | Total Pixels | |||||
---|---|---|---|---|---|---|---|---|---|---|---|

W | B | W | B | W | B | W | B | W | B | ||

1 | 1234 | 2852 | 1400 | 2686 | 1200 | 2886 | 244 | 3842 | 8 | 4078 | 4086 |

Results of k-Means Algorithm in Benign breast images | |||||||||||

2 | 2345 | 11,349 | 1345 | 12,349 | 8456 | 5238 | 1300 | 12,394 | 248 | 13,446 | 13,694 |

Results of k-Means Algorithm in Malignant breast images | |||||||||||

3 | 891 | 12,657 | 2449 | 11,099 | 8762 | 4786 | 1219 | 12,529 | 327 | 13,221 | 13,548 |

Image No | C1 | C2 | C3 | C4 | C5 | Total Pixels | |||||
---|---|---|---|---|---|---|---|---|---|---|---|

W | B | W | B | W | B | W | B | W | B | ||

1 | 1334 | 2752 | 1500 | 2586 | 1200 | 2886 | 244 | 3842 | 8 | 4078 | 4086 |

Results of FCM Algorithm in Benign breast images | |||||||||||

2 | 2145 | 11,449 | 1245 | 12,449 | 8156 | 5338 | 1200 | 1394 | 248 | 13,446 | 13,694 |

Results of FCM Algorithm in Malignant breast images | |||||||||||

3 | 991 | 12,557 | 2349 | 11,199 | 8862 | 4686 | 1119 | 12,429 | 227 | 13,321 | 13,548 |

Image No | C1 | C2 | C3 | C4 | C5 | Total Pixels | |||||
---|---|---|---|---|---|---|---|---|---|---|---|

W | B | W | B | W | B | W | B | W | B | ||

1 | 1534 | 2552 | 1500 | 2586 | 1200 | 2886 | 244 | 3842 | 8 | 4078 | 4086 |

Results of MCA Algorithm in Benign breast images | |||||||||||

2 | 3694 | 10,000 | 1694 | 12,000 | 8694 | 5000 | 200 | 13,494 | 148 | 13,546 | 13,694 |

Results of MCA Algorithm in Malignant breast images | |||||||||||

3 | 2991 | 10,557 | 1349 | 12,199 | 7862 | 5686 | 2119 | 11,429 | 327 | 13,221 | 13,548 |

Result of k-means Algorithm | |||||
---|---|---|---|---|---|

Normal Image | Benign Image | Malignant Image | |||

RunTime(ms) | Memory(KB) | RunTime(ms) | Memory(KB) | RunTime(ms) | Memory(KB) |

1544 | 7.2 | 2540 | 16.1 | 2068 | 24.1 |

Result of FCM Algorithm | |||||

1244 | 2.2 | 2040 | 10.1 | 2028 | 14.1 |

Result of MCA Algorithm | |||||

1044 | 1.2 | 1010 | 4.1 | 1168 | 7.1 |

Predictive Model | Accuracy | Sensitivity | Specificity | Time in ms |
---|---|---|---|---|

J48 | 75.58 | 75.5853 | 24.4147 | 135 |

CART | 92.30 | 92.3077 | 7.6923 | 186 |

JRIP | 82.60 | 82.6087 | 17.3913 | 201 |

SVM | 87.95 | 87.9599 | 12.0401 | 225 |

Naïve Bayes | 84.28 | 84.2809 | 15.7191 | 308 |

data by varying the statistic rate. It is observed that kappa statistic, mean absolute error and root mean squared error are almost negligible for all the five classification algorithms. For SVM classification algorithm alone relative absolute error is negligible, whereas for the other four algorithms the relative absolute error is greater than 96% in which Naive Bayes algorithm has the highest relative absolute error rate of 152%. The root relative squared error is greater than 99% for all the five classification algorithms, in which Naive Bayes algorithm has the highest root relative squared error rate of 121%.

The performance of all the five classification algorithms were analysed based on its Sensitivity, Specificity and Accuracy. CART classifier has the highest sensitivity of 92%, J48 has the lowest Sensitivity of 75%, wheres JRIP, SVM and Naïve Bayes have sensitivity of 82%, 87% and 84%, respectively. J48 has highest the specificity of 24%, CART classifier has the lowest Specificity of 7%, whereas JRIP, SVM and Naïve Bayes have specificity of 17%, 12% and 15% respectively. The outcomes indicate that CART classifier has highest accuracy of 92.30%, SVM algorithm has second highest accuracy of 87.95%, Naive Bayes algorithm has accuracy of 84.28%, JRIP algorithm has accuracy of 82.60%, whereas the J48 algorithm has least accuracy of 75.58%.

^{th} cluster by k-Means algorithm and by FCM also. But, in the 4^{th} cluster itself the MCA algorithms were successfully identified the region very effectively. The time complexity and volume complexity was also very less in MCA compared with the other two algorithms.

Generally, no one can say that a particular algorithm is the best algorithm for the prediction purposes using any kind of real world data set for some applications. But, it is possible to suggest the performance of algorithms for the chosen data. Based on this clear notion, the performance of two partitioning based clustering algorithms k-Means and FCM were compared with the proposed method MCA. The results were analyzed by several executions of the programs. After enormous comparisions of all three algorithms, it is concluded that the performance of MCA algorithm was better than the other two algorithms. The novel, the multifarious clustering algorithm identifies the cancer affected regions very effectively and efficiently. The accuracy was verified by classification algorithms J48, JRIP, SVM, Naive Bayes and CART algorithms by its various performance measures. Within the classification algorithms, the accuracy of CART was found to be better than the remaining algorithms. The images analyzed by k-Means, FCM and MCA helped to detect the breast cancer affected area by detecting tumor in the images. The method of analysis and the result of this research work was accepted after due consultation with a physician. The future work in this area involves the use of image segmentation methods and other types of clustering algorithms.

The authors declare no conflicts of interest regarding the publication of this paper.

Velmurugan, T. and Venkatesan, E. (2019) A Hybrid Multifarious Clustering Algorithm for the Analysis of Memmogram Images. Journal of Computer and Communications, 7, 136-151. https://doi.org/10.4236/jcc.2019.712013