Impact of Standardized Scheme on the Detection of Chest X-Ray Abnormalities and Radiographic Diagnosis of Pulmonary Tuberculosis in Adult

Purpose: The complexity of chest radiography (CXR) is a source of variability in its interpretation. We assessed the effect of an interpretation grid on the detection of CXR anomalies and radiographic diagnosis of tuberculosis in an endemic area for tuberculosis. Methods: The study was conducted in Yaounde (Cameroon). Six observers (2 pulmonologists, 2 radiologists and 2 senior residents in medical imaging) interpreted 47 frontal CXR twice two months apart without (R1) and with (R2) the aid of an interpretation grid. We focused on the detection of micro nodules (n = 16), cavitations (n = 12), pleural effusion (n = 6), adenomegaly (n = 6), and diagnosis of tuberculosis (n = 23) and cancer (n = 7). Results: The average score for accurate detection of elementary lesions was 40.4% [95%CI: 25% 58.3%] in R1 and 52.1% [36.9% 65.3%] in R2. The highest improvement was observed for micro nodules (19.8%). Cavitations had the highest proportions of accurate detections (58.3% in R1 and 65.3% in R2). The average score of accurate diagnosis was 46.1% in R1 and 57.4% in R2. Accurate diagnosis improved by 3.6% for tuberculosis and 19% for cancer between R1 and R2. Intra-observer agreement was higher for the diagnosis of cancers Corresponding author. M. L. Gharingam et al. 24 (0.22 ≤ κ ≤ 1) than for diagnosing tuberculosis (0.21 ≤ κ ≤ 0.68). Inter-observer agreement was highly variable with a modest improvement for the diagnosis of tuberculosis in R2. Conclusion: Standardized interpretation scheme improved the detection of CXR anomalies and diagnosis of tuberculosis. It significantly improved inter-observer’s agreement in diagnosing tuberculosis but not in detecting most lesions.


Introduction
In spite of numerous advances in cross-sectional thoracic imaging, chest radiography (CXR) remains the leading imaging modality for the exploration, diagnosis and monitoring of many chest diseases [1] [2].In most circumstances, it is the first-line imaging modality and frequently the only diagnostic imaging test used in patients with confirmed or suspected chest disease [1]- [4].
The role of CXR in the screening and diagnosis of pulmonary tuberculosis (TB) is well established [5]- [8].But the complexity of CXR image is a source of variability in the diagnosis of TB and lung diseases in general [5] [6] [9] [10].In most sub-Saharan Africa countries where TB is endemic, CXR is very often the only available or accessible chest imaging test [11] [12].Many CXR interpretation schemes have been developed in some countries in order to reduce interpretation discrepancies [5] [13]- [15].However, we are not aware of studies on the benefit of a CXR standardized interpretations grid in sub-Saharan African countries.
Inspired by the Chest Radiography Reading and Recording System (CRRS) and Japan-Vietnam Chest x-ray Coding System (JVCS) reading systems [6] [9], we developed a new CXR interpretation scheme and assessed its effect on the detection of CXR anomalies and radiographic diagnosis of pulmonary tuberculosis in adults Cameroonians.

Materials and Methods
This was an intervention study, carried out in Yaoundé (the Capital city of Cameroon) between December 2012 and February 2013.The study was approved by the Ethics Committee of the Faculty of Medicine and Biomedical Sciences and the administrative authorities of the Yaounde Jamot Hospital.

Development of the Interpretation Grid
A group comprising one experienced radiologist, one experienced pulmonologist, one final year specialist radiologist in training and one final year medical student developed the CXR interpretation grid based on an adaptation of the "Chest radiograph reading and recording system" CRRS [9] and "Japan-Vietnam CXR coding system" JVCs [6] (see appendix).The new grid was pre-tested before application for this study.The new grid included: one section for parenchymal lesions, one for pleural lesions, one for mediastinal lesions, one for other damages and a last section for radiographic diagnosis.

Selection of Radiographs
CXR were selected from the department of pulmonology of the Yaounde Jamot Hospital (YJH), which is the largest referral and treatment center for chest diseases in Yaounde and its neighborhoods [16].Selected CXR were all posterior-anterior incidences of good photographic and technical quality, in digital format, performed in patients of more than 15 years of age.A total of 47 CXR were selected for this study: 23 of pulmonary tuberculosis, seven of lung cancers, seven of bacterial pneumonia, six normal CXR and four with diffuse infiltrative lung disease.All abnormal CXR had a confirmed diagnosis of the disease via appropriate investigations.

Selection of Observers
Six readers chosen by convenience, participated in this study: two pulmonologists totalizing five and 13 years of experience, two radiologists with one and five years of experience, and two final year residents in medical imaging.These readers are identified as "radiologist 1 and 2", "pulmonologist 1 and 2" and "resident 1 and 2".

Interpretation Procedure
Interpretation consensus for each CXR was obtained by the review of all the CXR images by a group consisting of one radiologist and one pulmonologist (8 years of experience each) and one final year resident in medical imaging.For each CXR, the consensual interpretation determined the elementary radiographic lesions and the radiological diagnosis.The first session of interpretation (R1) by the six observers was in the usual reading conditions using a report form with one part focusing on detection of elementary lesions and the other focusing on radiologic diagnosis.During the second reading session (R2) two months later, interpretation was made on an interpretation grid (see appendix).Each participant was instructed on the use of that grid before the reading session but was not aware that it was the same CXR from the first reading session.Images were arranged in a different order compared to the first reading session order.Interpretations were performed under the same conditions for observers without limitation of reading time.The day and reading time were chosen at the convenience of the observer.

Data Collection and Analysis
The sample size was calculated using the "Kappa Size" package of the R statistic software, version 2.13.0 [17]- [20].Based on an expected Kappa of 0.47 ± 0.13 [9] and a type I error of 0.05, the minimum sample size was 46 radiographs for six observers.A total of 47 radiographs were selected for this study.The elementary lesions and the following diagnosis were retained for analysis: pulmonary tuberculosis (n = 23), micro nodules (n = 16), caverns (n = 12), lung cancer (n = 7), pleural effusion (n = 6) and hilar or mediastinal adenomegaly (n = 6).The analysis focused on the accuracy of the detection of elementary lesions, diagnosis of pulmonary tuberculosis and lung cancer, the intra-observer and inter-observers agreement between the first and second reading.The data were entered and analyzed using SPSS 17 software (SPSS Inc., Chicago, USA).Kappa coefficient (k) was used to assess the agreement between the reading without grid and the reading with standardized grid.The following Kappa intervals and thresholds [18] were used to characterize the level of agreement: discordance (<0.0), low (0.0 -0.20), poor (0.21 -0.40), moderate (0.41 -0.60), good (0.61 -0.80), excellent (>0.81).

Results
The performance of our observers at detecting elementary lesions and making CRX diagnosis during the first reading session without standardized scheme (R1) and during the second session with standardized grid (R2) are shown in Table 1 for elementary lesions and in Table 2 for radiological diagnostics.Figure 1 shows four examples of CRX of this study.The average score of accurate detection of elementary lesions for all observers Table 1.Proportion of anomalies accurately detected by each observer at each reading session and intra-observer agreement between the two sessions.
Based on the kappa statistics for the intra-observer's agreement, the detection of micronodules significantly improved for all observers, with kappa statistics always lower than 1.Values ranged from 0.32 (95% confidence interval: 0.01 to 0.63) for radiologist 2 to 0.70 (0.42 to 0.97) for pulmonologist 1 (Table 1).With the exception of the two residents, the kappa statistics were also in favor of significant improvement in the detection of cavitations at R2, with values ranging from 0.06 (−0.24 to 0.36) for radiologist 1 to 0.45 (0.10 to 0.79) for pulmonologist 2. For adenomegaly and pleural effusion, significant improvement occurred for half and 2/3rd of observed respectively (Table 1).
The average score of accurate diagnosis for all observers was 46.1% in R1 and 57.4% in R2, indicating an improvement of 11.3% when using standardized grid.The overall improvement in the score of accurate diagnosis between R1 and R2 was 3.6% for tuberculosis and 19% for lung cancers.The pulmonologist 1 had the best overall score of accurate diagnosis (70.2%).With the exception of resident 2, significant improvement in the diagnosis of tuberculosis occurred in R2 with kappa statistics ranging from 0.21 (0.03 to 0.38) for resident 1 to 0.47 (0.28 to 0.65) for radiologist 1 (Table 2).Improved diagnosis of cancer based on the kappa statistic was significant only for radiologist 1 [kappa 0.22 (−0.22 to 0.63)], Table 2.
Variable inter-observers' agreement in the detection of lesions and diagnosis of tuberculosis and cancer was observed at both reading time-points.Inter-observer' agreement was poor-to-good for the detection of adenomegaly and micronodules, low-to-moderate for cavitations and diagnosis of tuberculosis, discordant-to-poor for pleural effusion, and poor-to-excellent for the diagnosis of lung cancer, with the exception of one pair for which discordance was noted (Table 3).
The direction of changes in the inter-observers' agreement between R1 and R2 was also variable, favoring both improvement, deterioration and no change.No consistent pattern of change was apparent across pairs of observers for any particular lesions, nor across all lesions and diagnosis within a given pair of observers (Table 3).

Discussion
An interpretation grid developed and used in our study had a broad positive impact on the detection of common lesions and the accuracy of diagnoses on chest X-rays in this setting.The observed improvement appeared to be more consistent across observers for micronodules, cavitations, diagnosis of tuberculosis and to a lesser extent the detection of adenomegaly.
The spectrum of the inter-observers' agreement both before and after implementation of the interpretation grid was very broad, with inconsistent effects of the grid on the agreement both within pairs of observers for all possible lesions and diagnosis, and across pairs of observers for any specific lesion or diagnosis.Other studies that have used a standardized interpretation scheme have shown a significant improvement in the interpretation of CXR [5] [9] [10] [21] [22].Indeed, the different sections of the grid are expected to impose to observers a more systematic approach to the analysis of each anomaly, thereby improving its detection.It's the same when a list of diagnosis is suggested at the conclusion of an interpretation.The intra-observer's agreement was excellent in over half of four observers for the detection of adenomegaly and diagnosis of cancers.This is consistent with poor-to-no impact of the grid on the performance of the observers.The kappa statistics for the inter-observer's agreement in detection of elementary lesions was highly variable across observers, lesions/diagnosis and reading sessions.For accurate detection of pleural effusion and accurate diagnosis of TB, the kappa statistics for the inter-observer's agreement were higher in R2 than in R1, indicating the positive impact of the grid in our study.This is in line to many studies where the standardized interpretation form significantly improved the concordance of reading; for example in South Africa with the CRRS [9], the five categories reading system in Canada [10], the three categories reading system in Switzerland [21] and the Russian classification [5] had.The lack of initial training prior the use of our interpretation grid could explain its limited impact on the interobserver's agreement in accurate detection of some anomalies during R2 session.Den Boon and al submitted readers to three-days training on the use of CRRS with pre-tests prior to its application to the study [9].The absence of clinical information would also have been a handicap for our observers.In fact, Schreiber et al. [23] have demonstrated that the clinical history improve interpretations of radiographs.Our readers were blinded to the clinical information to limit the influence on detection of radiographic lesions diagnosis.Understanding of clinical scenarios could take primacy over the ability of the observer to detect elementary lesions and set radiological diagnosis [24].
The intra-reader's agreement for accurate detection of cavitations ranged from poor to moderate for the pulmonologist, and from poor to mediocre for the radiologist.But cavitations had the best accurate detection in R1 and R2, a significant impact of the grid on intra-reader's agreement and no significant impact of grid on inter-reader's agreement.It is therefore possible that observers did not detect between the two reading sessions, the same caverns on the same picture.While the grid has improved the overall accuracy of detection, it did not significantly improve the concordance of the detection.Balabanova et al. [5] had found a moderate intra-reader's agreement for both the pulmonologist and the radiologist.The intra-reader agreement for accurate detection of adenomegaly was poor to moderate for the radiologist and poor to excellent for the pulmonologist.Other authors such as Shinsaku et al. [6] and Graham et al. [10] obtained best matches in the same reader for the detection of adenomegaly.
Our study has some limitations such as the small number of some lesions, which precluded our ability to apply more advanced statistical method for assessing the improvement in diagnostic capability such as the net reclassification improvement.However, the distribution of anomalies and diagnosis, as well as readers, reflect the routine practice scenario in this setting.Unlike other studies where only a sample of readers and radiographs were selected to study the intra-reader's variability, in our study all the six observers participated in two reading sessions and interpreted the same number of radiographs for each session.The interval between two reading sessions was long enough (2 months) to avoid image-memory effect on the second interpretation.After the text edit has been completed, the paper is ready for the template.Duplicate the template file by using the Save As command, and use the naming convention prescribed by your journal for the name of your paper.In this newly created file, highlight all of the contents and import your prepared text file.You are now ready to style your paper.

Conclusion
Standardized interpretation grid has a potential for improving the detection of common lesions and diagnosis of the most prevalent pulmonary diseases on chest X-ray in this setting.However, further validation by independent investigators is needed to confirm our finding.Furthermore, implementation studies are needed to confirm the acceptability of interpretation grids by healthcare practitioner in routine setting, and to identify the best strate-gies for promoting the uptake of the grids.

Figure 1 .
Figure 1.Examples of CXR included in this study.A: pulmonary TB with left apical cavity (cavern), mild consolidation of the lingula and ill-defined micronodules on the right upper lobe.B: bacterial pneumonia with middle lobe consolidation.C: excavated pulmonary carcinoma on the right lower lobe associated with mild pleural effusion.D: miliary tuberculosis with diffused micronodules on both lung fields.

Table 2 .
Proportion of accurate diagnosis given by each observer at each reading session and intra-observer agreement between the two readings.
R1: first reading session without scheme, R2: second session with scheme, NA: not applicable.

Table 3 .
Kappa coefficient (95% confidence interval) for the inter-observer agreement in the detection lesions and diagnosis of tuberculosis and cancer at the first and second readings.