TITLE:
Observer Variability in BI-RADS Ultrasound Features and Its Influence on Computer-Aided Diagnosis of Breast Masses
AUTHORS:
Laith R. Sultan, Ghizlane Bouzghar, Benjamin J. Levenback, Nauroze A. Faizi, Santosh S. Venkatesh, Emily F. Conant, Chandra M. Sehgal
KEYWORDS:
Breast Imaging, Breast Cancer, Observer Variability, Computer-Aided Diagnosis
JOURNAL NAME:
Advances in Breast Cancer Research,
Vol.4 No.1,
January
9,
2015
ABSTRACT: Objective: Computer classification of sonographic BI-RADS features can aid differentiation of the malignant and benign masses. However, the variability in the diagnosis due to the differences in the observed features between the observations is not known. The goal of this study is to measure the variation in sonographic features between multiple observations and determine the effect of features variation on computer-aided diagnosis of the breast masses. Materials and Methods: Ultrasound images of biopsy proven solid breast masses were analyzed in three independent observations for BI-RADS sonographic features. The BI-RADS features from each observation were used with Bayes classifier to determine probability of malignancy. The observer agreement in the sonographic features was measured by kappa coefficient and the difference in the diagnostic performances between observations was determined by the area under the ROC curve, Az, and interclass correlation coefficient. Results: While some features were repeatedly observed, κ = 0.95, other showed a significant variation, κ = 0.16. For all features, combined intra-observer agreement was substantial, κ = 0.77. The agreement, however, decreased steadily to 0.66 and 0.56 as time between the observations increased from 1 to 2 and 3 months, respectively. Despite the variation in features between observations the probabilities of malignancy estimates from Bayes classifier were robust and consistently yielded same level of diagnostic performance, Az was 0.772-0.817 for sonographic features alone and 0.828-0.849 for sonographic features and age combined. The difference in the performance, ΔAz, between the observations for the two groups was small (0.003-0.044) and was not statistically significant (p