Smear-Negative Multidrug-Resistant Tuberculosis a Significance Hidden Problem for MDR-TB Control : An Analysis of Real World Data

Purpose: The drug resistance pattern in tuberculosis (TB) is still under investigated. We analyzed the clinical data from the patients with smear positive TB and applied the model to predict the patients with smear-positive TB. Materials and Methods: Medical records information of 6977 cases was included from 11,950 inpatients from January 2009 to November 2013. The cases data were divided into a training set, test set and prediction set. Logistic regression analysis was applied to the training set data to establish a prediction classification model, the effect of which was then evaluated using the test set by receiver operating characteristic (ROC) analysis. The model was then applied to the prediction set to identify incidence of snMDR-TB. Results: Sixteen factors which correlate with MDR-TB-including frequency of hospitalization, province of origin, anti-TB drugs, and complications, were identified from the comparison between SP-TB and spMDR-TB. The area under the ROC curve (AUC) of the prediction model was 0.752 (sensitivity = 61.3%, specificity = 83.3%). The percentage of all inpatients with snMDR-TB (snMDR-TB/Total) was 28.7% ± 0.02%, while that of all SN-PTB with snMDR-TB (snMDR-TB/SN-PTB) was 26.5% ± 0.03%. The ratio of snMDR-TB to MDR-TB (snMDR-TB/MDR-TB) was 2.09 ± 0.33. Conclusion: snMDR-TB as an important source of MDR-TB is a significant hidden problem for MDR-TB control and can be identified by the prediction model. A kind of vicious circle with a certain delay effect exists between snMDR-TB and MDR-TB. To better control MDR-TB, it is necessary to pay greater attention to snMDR-TB, conduct further research and develop targeted therapeutic strategies. These authors contributed equally to this work. Corresponding authors.


Introduction
In 2011, there were an estimated 8.7 million new cases of tuberculosis (TB) and 1.4 million deaths from TB [1].The number of cases of multidrug-resistant TB (MDR-TB) notified in the 27 high MDR-TB burden countries is increasing and has reached almost 60,000 worldwide, and it is estimated that one in five (19%) cases of TB was MDR-TB.In the two countries with the largest number of cases, India and China, the figure is less than one in ten [1].The majority of new TB cases (80%) occurred in 22 countries and substantial proportion (35%) were identified as smear-negative pulmonary TB (SN-PTB) [2].It seems very likely that there must be SN-PTB cases which are smear-negative MDR-TB (snMDR-TB).We hypothesize that snMDR-TB is an important source of MDR-TB and that the number of snMDR-TB cases is likely greater than that of MDR-TB cases.If this is the case, the situation of MDR-TB in the real world may be more serious than that reported by the WHO [1], especially in India and China.
To our knowledge, the concept of snMDR-TB has not been proposed in previous reports.The reason is not because snMDR-TB does not exist, but rather because the concept has been difficult to define and predict effectively.However, we consider this to be a significant hidden problem for MDR-TB control.It is recognized that if we want to effectively control the TB epidemic, we need to focus not only on the treatment of active TB (ATB), but also on prevention and treatment to latent tuberculosis infections (LTBI).Similarly, if we want to control MDR-TB epidemic, we must focus on both MDR-TB and snMDR-TB simultaneously.In this article we defined the concept of snMDR-TB patients as SN-PTB patients whose clinical profiles are similar to those of MDR-TB patients, and have the potential possibility to become a MDR-TB patient.
The basic issues about snMDR-TB are how to determine whether a patient is snMDR-TB and how many snMDR-TB cases there are in the real world.In this article we apply statistical learning methods to explore these problems.

Study Design
A data mining study using real world data from TB inpatients was conducted in order to develop a method for estimating snMDR-TB and determining the proportion of snMDR-TB cases among hospitalized patients.All selected cases' data were divided into a training set, test set and prediction set.Using logistic regression analysis of the training set data, we established a prediction classification model.The effect of the model was validated using the test set based on receiver operating characteristic (ROC) analysis.The model was then applied to the prediction set to identify incidence of snMDR-TB (Figure 1).

Data and Setting
All data used in this study was extracted from inpatient medical records from January 2009 to November 2013 at the Beijing Chest Hospital, Capital Medical University, Beijing, China.This hospital is also known as the Beijing Tuberculosis and Thoracic Tumor Research Institute, the only institute in China which specializes in the research of TB, and has been designated as a specialized thoracic Grade 3 level A hospital.It has several affiliated departments: the "Clinical Center on Tuberculosis, CDC China"; the "Beijing Key Laboratory of Drug Resistance Tuberculosis Research"; and the "WHO Collaborating Center for Research and Training on Tuberculosis in China".More than 50,000 tuberculosis patents have been treated in this hospital, and more than 60% of its inpatients come from different provinces in China.The treatment regimes for all TB inpatients in the hospital are determined through a consulting of three-level TB specialist.Patient diagnosed with SN-PTB by reference to Chinese and WHO guidelines [3] [4], which mainly include three times of negative sputum smear results, chest imaging showing lesions of active PTB, other pulmonary diseases excluded by antibiotic trial for 2 weeks, and the TB evidence in the serial test such as interferon-gamma release assay (IGRA) or tuberculin skin test (TST), was identified by expert panels.

Criteria for Grouping
The initial data set included all cases for which the discharge diagnosis was TB.A subset of data was then selected according to the following inclusion criteria: (a) main discharge diagnosis was pulmonary tuberculosis (PTB); (b) hospitalized in internal medicine ward.The exclusion criteria were (a) main discharge diagnosis was extra pulmonary tuberculosis and (b) patient has been admitted for surgery in the surgical ward.All selected data were subsequently divided into a smear-positive PTB (SP-PTB) group and a SN-PTB group based on results of drug sensitivity testing (DST), and the SP-PTB group was divided into a training set and test set in a 1:1 ratio using the random number created by the computer.The SN-PTB group was used as the prediction set (Figure 1, Table 1).

Data Extraction
Raw data included all items in the inpatient medical records.After consultation with a number of TB experts, we selected following information as parameters for the classification model: age, gender, retreatment, province of origin, frequency of hospitalization, days of hospitalization, anti-TB drugs, and complications.After sub-separated of these parameters, there was included 522 items.Finally the items that were only one value in both MDR-TB and nonMDR-TB in training set and unsuitable for data mining were excluded, leaving a total of 300 items.

Statistical Analysis
For the training set, a stepwise logistic test was used to build the classification model [5].Data from the included items in the training and test sets were used respectively as parameters for construction of ROC curves.The area under the ROC curve (AUC), sensitivity and specificity were used as an index of accuracy to evaluate the performance of the model.Statistical significance was defined as P < 0.05.R version 3.0.1 software was used for all statistical analyses [6].

Ethics Statement
As a retrospective study using the data of inpatients' information, it was obtained the permission of the ethical committee of our hospital.For protecting the patients' personal privacy, name and admission number (AD) were deleted before data analysis.

Logistic Regression Analysis
A stepwise logistic regression model was applied to the training data set (1021 cases) to estimate the prediction factor (300 items) with snMDR-TB.Sixteen factors turned out to be significant predictors and other 9 factors' P value less than 0.1 (Table 2).In significant predictors 12 were positive correlated with snMDR-TB, and 4 were negative correlated with snMDR-TB.

ROC Analysis
The prediction performance for the established model (300 items) was evaluated respectively on the training and test sets with ROC analysis (Figure 2).The AUC of the training set for the prediction model was 0.959 (sensitivity = 90.6%,specificity = 90.0%, Figure 2

Model Application
The logit value was calculated by the prediction model (300 items) in the prediction set, and the cut-point value used was obtained by ROC analysis of the test set.Results for snMDR-TB and MDR-TB are shown in Table 3.
The percentage of snMDR-TB cases of the total number of inpatients (snMDR-TB/Total) was 28.7% ± 0.02%, while the percentage of snMDR-TB cases of the total number of SN-PTB inpatients (snMDR-TB/SN-PTB) was 26.5% ± 0.03%, and the ratio of snMDR-TB to MDR-TB (snMDR-TB/MDR-TB) was 2.09 ± 0.33.The change in MDR-TB/Total and snMDR-TB/Total over time is shown in Figure 3

Discussion
MDR-TB, defined as resistance to the two most effective anti-tuberculosis drugs isoniazid and rifampicin, is posing an enormous challenge to global TB control because of the long and complex treatment regimen that is required to cure it [7].In 2010 there were 650,000 reported cases of MDR-TB globally [8], and in 2011 the number is reported to have increased by at least 60,000 worldwide [1].In China approximately 1 in 10 all patients with TB have MDR-TB [9].In response to the prevalence of MDR-TB, significant emphasis is being placed on developing a rapid and low-cost detection method [10], finding new anti-TB drugs [11], developing more effective and safe anti-TB regimens [12] and building a more effective management system [13].All these measures are concentrated on current MDR-TB patients, however, this may not enough, and we also need to pay attention to potential sources of MDR-TB.What are the likely potential sources of MDR-TB?Contacts of MDR-TB patients are one possible source; the prevalence of MDR-TB among contacts ranges between 1.8% and 11.2% [7].Can preventive treatment lower the risk of developing MDR-TB among these contacts?In a recent systematic review that included three studies about the effectiveness of chemotherapy in preventing the de-velopment of MDR-TB [14]- [16], the authors concluded that there is not sufficient evidence to support or reject preventive treatment for the prevention of MDR-TB [17].Should we need change our strategy to prevent MDR-TB?Similar to the argument that ATB can be prevented by treating LTBI, we argue that preventive measures should not focus mainly on the conducts with MDR-TB, but rather should focus on those in the prophase of MDR-TB.However, traditionally there has been no way to identify this stage in SP-PTB.For this reason, we are proposing the concept of snMDR-TB.We believe snMDR-TB to be the main source of MDR-TB and that effective therapy of snMDR-TB would improve the control of MDR-TB; in other words, snMDR-TB is the "Missing Link" in the prevention of MDR-TB (Figure 4).While we believe that snMDR-TB is a significant hidden problem for MDR-TB control, the bottleneck is how to distinguish snMDR-TB from SN-PTB.There are no exiting methods for solving this problem.Statistical learning methods have been used widely in different areas of medical prediction [18] [19].We first applied several different algorithms to solve this problem, including C4.5, classification and regression tress (CART), and k nearest neighbors (KNN) [20].Of these, KNN had the greatest sensitivity, while C4.5 and CART provided clear decision trees which can be used to solve the problem step by step.However, each of these three methods could not estimate the degree of correlation between each factor of interest and MDR-TB.For this reason, we chose to use logistic regression.We found 16 factors correlation with MDR-TB.Since the first aim of this research was to discriminate between snMDR-TB and SN-PTB, we did not restrict the prediction model to only the 16 correlated factors, but retained all 300 factors as parameters in the prediction model.ROC analysis gave AUC values of 0.959 and 0.757 in the training and test sets respectively.The difference in AUC values in the two sets is in agreement with the rules of statistical learning [5].In the test set, when cut-point of logit value was equal to −0.0329, the sensitivity was 61.3% and the specificity was 83.3%; these results are acceptable, because these did not overestimate the number of snMDR-TB.
Many risk factors, such as non-standard or irregular therapy, adverse effects of anti-TB medication [21], the convergence of the epidemics of MDR-TB and HIV infection [22], and alcohol abuse [23] have been reported to be associated with MDR-TB.However, since the limitation of our database's information, we could not include all these factors in the analysis.Previous treatment for TB has been considered to be the strongest risk factor for MDR-TB [24].In our research two items, retreatment and frequency of hospitalization, were associated with previous treatment.Frequency of hospitalization was significantly correlated with MDR-TB, but retreatment was not.This may be because our data was derived from inpatient information and most inpatients were retreatment patients, this would increase the background noise in data mining and as result significant difference in some of the items included was not detected.
In 2011 Otero reported a high rate of primary MDR-TB in a general population with no identifiable risk factors for MDR-TB [25].In China, the prevalence of MDR-TB is not the same in different regions; here, we found two provinces had a positive correlation with MDR-TB.This may be because MDR-TB is endemic in these provinces, and/or because they have a high referral rate for MDR-TB patients.Without epidemiological data, it is not possible to conclude whether MDR-TB is endemic in these provinces.
Vadwai reported in 2012 that a history of previous treatment with a fluoroquinolone and an injectable other than streptomycin is a risk factor for MDR-TB [26].Although our dataset only included information on anti-TB drugs administered during the current hospitalization period, and not the history of previous treatment, the current regimens would have been adjusted according to the history of previous treatment and the condition of the patients.Here, eight drugs were significantly correlated with MDR-TB; Levofloxacin, Moxifloxacin, Capreomycin, Amikacin, Para-aminosalicylic acid and Amoxicillin/clavulanic acid being positively correlated with MDR-TB, in line with previous reports, and Isoniazid and Rifampicin being negatively correlated with MDR-TB.These were consistent with the definition of MDR-TB and the WHO guidelines [27].
With the exception of HIV infection, the relationship between disease complication and MDR-TB is rarely reported.Actually, whether HIV infection is an independent risk factor of MDR-TB is also a controversial issue [22] [28].In our results HIV infection was not significantly correlated with MDR-TB, but this may be because HIV infection has been contained at a low level in China [29].Factors correlated with MDR-TB are not necessarily risk factors.We found that coronary heart disease and pleurisy were negatively correlated with MDR-TB.This may be because in these condition the patient in need to be hospitalized for more careful treatment, but not due to MDR-TB, another hospitalized reason.In addition leucopenia, pulmonary heart disease and urinary tract infections were positively correlated with MDR-TB; these are likely, not risk factors but rather the results of prolonged illness or adverse drug reactions.
The prevalence of MDR-TB was different between countries and regions [9] [30] [31].However, looking at global trends, the number of MDR-TB cases in increasing every year [1] [8].This research only included real world data on TB from one hospital in Beijing, China.As most of the TB patients in our hospital come from every province in China, our data, at least indirectly, reflect the prevalence profile of TB in China.Consistent with global trends [1], we also observed an increase in MDR-TB patient numbers in our data (Table 3).Figure 4 shows our model for the dynamic transmission of MDR-TB.We believe that MDR-TB transmission dynamics can be represented as a loop, with different stages of MDR-TB being links, and snMDR-TB being the most important "Missing Link".According to this model, a kind of vicious circle with a certain delay effect exists be-tween snMDR-TB and MDR-TB.In Figure 3, the trend in MDR and snMDR-TB is in on direction.In 2009 and 2010 MDR-TB and snMDR-TB were both at a low and parallel level; however, in 2011, the level of MDR increased, while snMDR-TB remained at a low level, but increased the following year.In 2012 the level of MDR-TB decreased, but not to baseline.In 2013, the trend between snMDR-TB and MDR-TB returned to be a parallel pattern.These results are consistent with our dynamic transmission model of MDR-TB.Our initial hypothesis was that snMDR-TB is an important source of MDR-TB and that the number of snMDR-TB cases is greater than the number of MDR-TB cases.Our results confirm this hypothesis, suggesting that there are almost two snMDR-TB cases for each case of MDR-TB and that this relationship is relatively stable (Figure 3(a)).
In this article we first proposed the concept of snMDR-TB and then used real world data to calculate the number of snMDR-TB cases with statistical learning methods.However, currently there is no objective test to evaluate snMDR-TB.We found 16 factors that correlate with MDR-TB, and the AUC of our prediction model was acceptable.As the risk factors for MDR were different between regions and over time [22] [23] [32], and we could not include all factors, the range of application of our prediction model is very limited.However, this should not reduce the significance of snMDR-TB.Our work suggests that greater attention should be paid to snMDR-TB, and more extensive and in-depth research should be carried out to find a better method of assessment and to develop targeted therapeutic strategies.This would facilitate the prevention and control of MDR-TB.

Conclusion
SnMDR-TB as an important source of MDR-TB is a significant hidden problem for MDR-TB control and can be identified by the prediction model.A kind of vicious circle with a certain delay effect exists between snMDR-TB and MDR-TB.To better control MDR-TB it is necessary to pay greater attention to snMDR-TB, conduct further research and develop targeted therapeutic strategies.
(a), while the change in MDR-TB/SP-PTB and snMDR-TB/snMDR-TB over time is shown in Figure 3(b).

Figure 4 .
Figure 4. Dynamic transmission model of normal TB and MDR-TB.

Table 1 .
Basic demography among groups.

Table 3 .
The number of MDR-TB and snMDR-TB in 5 years.