Meta-Analysis: 18F-FDG PET or PET/CT for the Evaluation of Neoadjuvant Chemotherapy in Locally Advanced Breast Cancer

Purpose: To evaluate the accuracy and the predictive value of F-FDG PET or PET/CT in the assessment of neoadjuvant chemotherapy (NAC) in locally advanced breast cancer by meta-analysis. Materials and Methods: Relevant studies were identified by systematic searches of PUBMED and COCHRANE databases, published in English. To ensure homogeneity of all included studies, selection criteria were established and all the studies were scored according to Quality Assessment of Diagnostic Accuracy Studies (QUADAS) criteria. Meta-analysis was done on the diagnostic performance data from eligible studies. Draw funnel plots to explore the publication bias. Draw forest plots to exclude abnormal data(s). Use Spearman correlation coefficients ρ, likelihood ratio χ test and I index in order to indicate heterogeneity. Estimate and compare the weighted summary sensitivities (SEs), specificities (SPs), diagnostic odds ratios (DORs), and summary receiver operating characteristic (SROC) curves of PET and other examinations (measuring the size of tumor). Subgroup analyses were performed to identify heterogeneity potential sources. Do Z test to find significant difference between each results. Results: 27 groups of data in 19 eligible studies were included with a total of 1164 subjects evaluated by F-FDG PET or PET/CT and 291 ones evaluated by other examinations. Funnel plots showed the existence of publication bias. Spearman correlation coefficients ρ, likelihood ratio χ test and I index explored the heterogeneity. The Results of the Weighted Summary: SEPET was significantly higher than SED [83.7% (329/393) vs. 59.0% (98/166), p < 0.001], SPPET was significantly higher than SPD [66.8% (512/766) vs. 40.8% (51/125), p < 0.001], DORPET was significantly higher than DORD (14.02 vs. 1.29, p < 0.05). The results show that FDG-PET was more accurate in assessment NAC efficiency. Draw SROC curves with Metadisc 14.0 and caculate results showed AUCPET and QPET were both significantly higher than AUCD and QD (AUCs 0.8838 vs. 0.6046; Qs 0.8143 vs. 0.5788, p < 0.001), which confirmed the advantage of FDG-PET. Subgroup analysis showed that performing FDG-PET after the 1st or 2nd cycle of NAC was a litter better than later with higher SE (p = 0.083). Standardized uptake value (SUV) reduction rate between 40% and 45% as FDG-PET response threshold value was used for its highest SP (p = 0.01), while no significant difference was found comparing SEs and DORs (p > 0.05). Trend of higher SE and lower SP were found at ER negative breast cancers than ER positive ones (SEs 93.94% vs. 83.33%; SPs 35.76% vs. 62.24%), though Z test did not find significant difference (p > 0.05). Conclusion: This meta-analysis showed that FDG-PET or PET/CT does have a higher global accuracy in assessing the response for NAC in breast cancer. Comparing with clinical response, metabolic response plays a potential role in directing therapy for breast cancer. Factors which affected the accuracy of FDG-PET assessmnet included PET timing point, SUV reduction rate as threshold value and ER expression.


Introduction
Breast carcinoma is the most common cancer in women in Western Europe and the United States with an incidence highest in the 40 -55 age range, and its prevalence is still on the rise [1,2]. It accounts for 40,000 and 14,000 deaths yearly in the US and UK, respectively, and that makes it the second cause of cancer death in women in those countries [1,3].
Neoadjuvant chemotherapy (NAC), initially used only for locally advanced breast cancer, is now commonly used in patients with operable but large breast cancer. This strategy allows patients to undergo breast-conserv-ing surgery and gives information on the efficacy of chemotherapy [4]. Long-term outcomes are significantly correlated with pathological tumour response rates [5]. Patients who achieve pathological complete response (pCR) have longer disease-free and overall survival rates compared with nonresponder [5][6][7]. Therefore an invasive method for early evaluation of the response to NAC in patients with operable breast cancer is necessary.
According to the recommendations of the American Society of Clinical Oncology (ASCO) 2006 update of the breast cancer follow-up and management guidelines in the adjuvant setting, physical examination and mammography should be used routinely in the breast cancer surveillance. Additional imaging methods, such as ultrasound (US), computed tomography (CT) scans, breast magnetic resonance imaging (MRI) and positron emission tomography (PET) with 18F-fluoro-deoxy-glucose (FDG) scans are not recommended [8]. But physical examination and mammography have their limitations, especially for evaluation of the changes in breast tissue. US, CT and MRI mainly provide information about the tumor size to assess the response to NAC, which is called clinical response. The whole-body imaging modality PET provides information about the metabolical activity of tumors to assess the response to NAC, which is called metabolical response. Previous studies [9][10][11][12][13] performed some meta-analysis to assess FDG-PET for the evaluation of breast cancer recurrences and metastases. Thus, Our study aims to perform a comprehensive systematic review to obtain the role of an early evaluation with FDG-PET of the response to NAC before surgery, and we also focus on the comparison between clinical response and metabolic response, which, to our knowledge, had not previously been studied.

Data Sources and Eligibility
Published studies of NAC evaluation in breast cancer with FDG-PET or PET/CT were identified by systematic searches of PUBMED and COCHRANE databases. The following kewords were used: ("PET" OR "positron emission tomography" OR "FDG" OR "fluorodeoxyglucose") AND ("breast carcinoma" OR "breast cancer" OR "carcinoma of breast") AND ("neoadjuvant" OR "chemotherapy"). Articles were limited to the period between 1966 and 2012, and were performed with the assistance of JIAO TONG UNIVERSITY LIBRARIAN.
The inclusion criteria were as follows: 1) full reports published in English; 2) articles dealt with the performance of PET (alone or in combination, but not in sequence); 3) use of 18F-FDG as imaging radiotracer; 4) pathological results as golden standard; 5) changes of semi-quantitative value were for evaluation criterion and set a threshold value to distinguish between metabolical responders and metabolical non-responders; 6) only articles that present sufficient data to calculate the true positive (TP), false positive (FP), false negative (FN), true negative (TN) values were included; 7) sample size was at least 10 subjects; 8) assess pre-chemotherapy and postchemotherapy in locally advanced breast cancers.
Since the validity of the individual studies may affect the interpretation of a diagnostic meta-analysis, Quality Assessment of Diagnostic Accuracy Studies (QUADAS) criteria [14] were adapted for assessment the quality of each article. Removing unsuitable items (question 3, 7, 9 were not suitable for our golden index standard-pathological test; question 12 was not suitable for reference test which set a threshold value for evaluation), there remained ten (all items were listed in Table 1). Each question should be answered as yes, no or unclear. All included studies were scored on all 10 items to provide an overall score. For the purpose of this analysis, "yes" was scored as "1", while "no" and "unclear" were both scored as "0". Articles with score upon "6" were eligible for analysis.
Six reviewers, among who 3 had at least 3 years work experience in nuclear medicine, independently checked retrieved articles. In case of discordances, a consensus re-review between all reviewers was performed.

Data Synthesis and Statistical Analysis
Data from individual studies were summarized in a 2 × 2 table classifying patients or lesions as TP, FN, FP and TN. If an article included several assessment time points, they were enrolled into study as different groups of data. If an article included upon two threshold values of semiquantitative value decrease rate, select the highest accurate data.
Test publication bias by drawing funnel plots. Forest plots were to find abnormal data to exclude. Test the following items to find heterogeneity: threshold effects between studies using Spearman correlation coefficients ρ (the cutoff effect was considered present in case of a ρ value > 0.4); heterogeneity using the likelihood ratio χ 2 test (if p < 0.05 was considered having apparent heterogeneity) and I 2 index which is a measure of the percentage of total variation across studies due to heterogeneity beyond chance and takes values between 0 and 100%. Its values over 50% indicate heterogeneity. If all tests confirmed publication bias and heterogeneity, a random effect model was used for the primary meta-analysis to obtain the weighted mean sensitivity (SE), specificity (SP) and diagnostic odds ratio (DOR) with 95% confidence intervals (CIs) of FDG-PET and other examinations. Otherwise, a fixed effect model was used. DOR is the best single global measure of diagnostic test per-formance that encompasses both SE and SP.
Golden standard was not pathological results (n = 4); 6) The changes of semi-quantitative value were not as evaluation criteria (n = 16); 7) Data were insufficient for calculating SE and SP (n = 9); 8) Sample size was under 10 (n = 5); 9) Studies not compared changes of values between pre-therapy and post-therapy (n = 19). All of the 19 studies scored upon 6 according to QUADAS criteria. Table 1 pooled the results of the distribution of study design characteristics and Figure 1 summarizes the QUADAS criteria results of the 19 studies. The informations of all included studies and the main characteristics of data for evaluation metabolical response and clinical response were listed in Tables 2-4. Because other examinations assessed the effect of NAC by tumor size, we defined them as "D" for subscript.
Asymmetric summary receiver operating characteristic (SROC) curves were fitted using weighted regression or inverse variance method (Moses' model [15]), and their area under the curve (AUC) and Q * index calculated. AUC summarizes diagnostic performance as a single number, while Q * index is the point where SE and SP are equal.
When statistical heterogeneity was identified, subgroup analysis was performed to identify its potential sources (e.g., different PET timing points, response criteria and molecular phenotype of primary breast cancer). Z test was employed to identify if significant difference existed between subgroups.
Z test was employed to identify if significant difference existed between two modalities of examinations and subgroups, including SE, SP, DOR, AUC and Q * index. If p < 0.05 was considered as statistically significant.
All of the statistical analyses were undertaken using RevMan 5.1, STATA 11.0 and Meta-Disc14.0.

Publication Bias, Heterogeneity and Cutoff Effect
After extraction informations of 19 articles, there included 27 groups of data for FDG-PET and 8 groups of data for other examinations. To assess a possible publication bias, scatter plots were designed using the log-DORs of individual data against their sample size. The funnel plots of FDG-PET and other examinations were given in Figure 2. In detail, figure of FDG-PET showed nearly symmetry but one plot beyond 95% CIs. Figure of other examinations showed marked asymmetry with fewer studies above the horizontal line. There were 2 plots beyond 95%CIs. Analysing with the figures, we thought a possible publication bias in both FDG-PET and oher examinations. The forest plots of DORs (Figure 3) showed an abnormal value comparing others (other examinations of study written by Park JS [31]). The probably reason was that the study add a criterion of the enhancement significancy on post-chemotherapy MRI scan.
Therefore, other examination data of this study was excluded. The heterogeneity test results were as follows: There was no heterogeneity for FDG-PET except the test of SP. There was heterogeneity for other examinations except the test of DOR, which confirmed either by likelihood ratio χ 2 test or I 2 index (

SROC Curves, AUC and the Q * Index
Summary receiver operating characteristic analysis was used to generally compare FDG-PET and other examinations (Figure 4). The AUCs of FDG-PET and other examinations were 0.8838 ± 0.0190, 0.6046 ± 0.1003. AUCPET was significantly higher than AUCD (p < 0.001). The Q * index of FDG-PET and other examinations were 0.8143 ± 0.0194, 0.5788 ± 0.0764. Q * PET was significantly higher than Q * D (p < 0.001).

Subgroup Analysis for Primary Breast Cancer Response
The heterogeneity of SP in FDG-PET among the 27 groups of data rationalized several subgroup analyses to identify its possible sources. It was noted that those studies employed different regimen of FDG-PET, including the PET timing points and cutoff values of semi-quantitative value as metabolical response criteria. Influence of different molecular phenotypes to the accuracy of FDG-PET evaluation was also analysed.

Evaluation of PET Timing Points
One study [28] used letrozole for NAC that chemotherapy cycle was not as unit to measure PET timing points. And one study [18] didn't discribe time of the second FDG-PET after completion of NAC. The remaining 25 groups of data were divided into 2 subgroups: evaluation FDG-PET after 1 -2 cycles of NAC (subgroup A) and after upon 3 cycles (subgroup B). Results were showed in Figure 5.

Cutoff Value as PET Response Criteria
Two studies [17,20] were excluded for not using SUV max or SUVp as FDG-PET response criteria. And groups that using under 40% or upon 70% for threshold value were too few to consolidate, which were also excluded. Then 22 groups of data were divided into subgroup I (cutoff value 40% -45%), subgroup II (cutoff value 50% -55%) and subgroup III (cutoff value 60% -65%). Results were showed in Figure 6. Z test was employed which explored high significant difference between subgroup I and subgroup II comparing SPs (p = 0.01).

Influence of Different Molecular Phenotypes
Of all 19 studies, 4 consisted correlation bewteen SU-Vmax and estrogen receptor [ER] expression. Data were showed in Table 6. At ER positive group, pathological response rate and metabolical response rate were 12.39% and 48.42%, respectively, at the same time reduction rate of SUV (△SUV%) was 45.00%; Pooled metabolical response accuracy, SE and SP were 83.33% and 62.24%, respectively. At ER negative group, pathological response rate and metabolical response rate were 49.05% and 80.26%, respectively, meanwhile △SUV% was 62.95%; pooled metabolical response accuracy, SE and SP were 93.94% and 35.76%, respectively. No significant difference was found in all above-mentioned results (p > 0.05).

Discussion
Preclinical models have demonstrated that the administration of chemotherapy prior to tumor removal is biologically more favorable than postoperative administration [35]. Effective preoperative chemotherapy can reduce the size of the primary tumor, thus allowing breastconserving surgery and also provides a prognostic information compared with primary tumor resection followed by adjuvant chemotherapy in patients with a pCR [36]. The early identification of non-responders can also avoid an unnecessary delay in instituting alternative therapy. Conventional breast imaging procedures, including mammography, US, and MRI have been used for measuring tumor size to derive the response to therapy. However, the clinical response does not necessarily reflect the histopathologic response because of the limited accuracy and reproducibility in determining tumor size and the delay between initiation of therapy and tumor shrinkage [26].
Currently, histopathologic analysis is necessary to accurately assess the response to NAC. PET imaging has been proposed to improve diagnostic strategies in cancer patients by identification of primary tumors and distant metastases [37]. PET as metabolic image has been shown to be potentially valuable for staging of various tumor types, including breast cancer. FDG-PET has been shown to be a more sensitive technique for the assessment of chemotherapy responses because it is better at distinguishing cancerous tissue from necrotic and fibrotic tissues and it reflects therapy-induced metabolic changes, which are known to precede volumetric changes in a tumor. The therapy-induced changes in tumor metabolism may be helpful in making decisions about continuation, modification, or cessation of chemotherapy [17,21].
Across all 19 studies, the pooled SEPET was significantly higher than pooled SED (83.7% vs. 59.0%, p < 0.001), which resulted in higher detection rate of effective treatment. Pooled SPPET was higher than pooled SPD (66.8% vs. 40.8%, p < 0.001), which resulted in higher distinguishment rate of invalid treatment. Pooled DORPET, AUCPET and Q * PET were all significantly higher than pooled DORD, AUCD and Q * D (DOR: 14.017 vs. 1.288, p < 0.05; AUC: 0.8824 vs. 0.6046, p < 0.001; Q * : 0.8129 vs. 0.5788, p < 0.001). All results suggested that reduction rate of glucose metabolic of tumor tissues can be more accurately assess the efficiency of NAC than that of reduction rate of tumor size in breast cancer.
There are several limitations to our study. 1) The funnel plots showed possible publication bias in both FDG-PET and other examinations. To find source of this publication bias, we summarized the QUADAS criteria result of the 19 studies. The total proportion of quality score was 81.58%, which suggested high quality. But the representative spectrum showed the lowest proportion of quality score which is the unique score below 50%. Among 19 studies, six [16,17,19,21,23,24] included only invasive ductal carcinoma (IDC) and invasive lobular carcinoma (ILC), two [28,31] included only IDC and mucinous carcinoma, and one [32] included 108 IDC and 2 other kinds of carcinoma. And in all articles that discribed representative spectrum, the proportion of IDC was far greater than others, which may be the main reason for the publication bias. An important limit is that the pretreatment SUV must be high in order to detect a meaningful reduction during treatment. Low contrast tumours are more difficult to distinguish from background tissues and are more affected by imaging imprecision. This requirement limits the use of PET in patients whose tumours have low initial FDG uptake, which is the case more important for ILC [23,38]. ILC represents the second histological type of breast cancer (almost 15%) after IDC (almost 80%). ILC is a well-established source of weak FDG uptake [38] and PET might not be suitable for early evaluation in this subtype. The chemosensitivity of lobular carcinoma is low [39,40]. Well-differentiated steroid receptor-positive tumours can sometimes also be a source of low FDG uptake. But there were not sufficient informations to carry out a subgroup analysis of different subtype carci-noma or receptor expression; 2) The heterogeneity test results were as follows: There was no heterogeneity for FDG-PET except the test of SP, while heterogeneity for other examinations except the test of DOR. There was no conclusive evidence of a cutoff effect for FDG-PET to Spearman correlation coefficients (p value < 0.4). But a cut off effect was present for other examinations (p value > 0.4). The existence of heterogeniety suggested the needs for higher quality prospective studies and multicenter trials. In this meta-analysis, Subgroup analyses were performed to identify heterogeneity potential sources, including PET timing points and cutoff values as metabolical response criteria. Figure 5 suggested SE rose and SP decrease gradually as time goes on. The best time point was after the second cycle of NAC while DOR was the highest. Figure 6 showed 40% -45% was the best cutoff value of SUVmax as metabolical response with the highest DOR, especially for highest SP which will avoid over-treatment during NAC. These results were something different from those of Yuting Wang et al. [41].
Since breast cancer is a heterogeneous disease with a demonstrated in prognosis based on molecular phenotypes, many researchers have attempted to perform risk stratification and individualized treatment according to molecular phenotypes and a few studies tried to find the proof of which molecular phenotypes of breast cancer interpret FDG-PET evaluation accuracy [28,30,31,33,34]. In this meta-analysis, we found a trend that ER negative breast cancers had higher SE than ER positive, but lower SP, which probably because of a higher baseline SU-Vmax level [42] and lower metabolical response rate (48.42% vs. 80.26%) in ER positive resulted in greater SUVmax changes (62.95% vs. 45.00%) which leaded to difficultly decided thresold value for metabolical response criteria. Furthermore, lower pathological response rate (12.39% vs. 49.05%) and lower metabolical response rate in ER positive than in ER negative suggested that the NAC effect is not obvious to ER positive, and also reduced quality of PET image. Therefore, further study will focus on research different criteria according to molecular phenotypes for more accurate evaluation.

Conclusion
Based on the studies reviewed, FDG-PET does have a higher global accuracy in assessing the NAC response in breast cancer. It seems to be a more useful supplement to current surveillance technique to reflect the histopathologic results. Comparing with clinical response, metabolical response plays a potential role in directing therapy for breast cancer. In order to have better correlation with pathological response, it's suggested to perform FDG-PET after second cycle of NAC and employ cutoff value between 40% and 45% as FDG-PET criteria for metabolical response. Furthermore, different criteria will be drawn up according to molecular phenotypes in breast cancer for individualizing examinations. With the development of medical equipment and the improvement of PET technology, it is important to collect more randomized studies, which can provide more useful information for guidance clinical work.

Acknowledgements
This work was supported by Shanghai Leading Academic Discipline Project S30203.