Multiparameteric PET-MR Assessment of Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer : PET , MR , PET-MR and Tumor Texture Analysis : A Pilot Study

Purpose: Patients with locally advanced rectal cancer (LARC) achieving pathologic complete response (pCR) to neoadjuvant chemoradiotherapy (CRT) have significantly improved long term survival. Preoperative detection of pCR may enable a conservative therapeutic approach in some patients. The purpose of the current prospective pilot study was to assess multiparametric qualitative and quantitative MR, PET, PET-MR and tumor texture features in predicting pCR to CRT in patients with LARC. Material and Methods: Eighteen LARC patients underwent staging with FDGPET and MR-rectum and 15 had post-CRT restaging. Response was assessed qualitatively and quantitatively. SUV (tumor/background), SUV/ADC, and tumor texture parameters derived via machine learning algorithms (MLA) from PET and multiple MR sequences and were correlated with histopathology. Results: A third of patients had pCR. Sensitivity, specificity & accuracy of PET, MR and combined PET-MR were 90, 60, & 80; 90, 20 & 66.7; 90, 80 & 86.7, respectively. Differences did not reach statistical significance. Quantitatively, only tumor-muscle (SUV/ADC) ratio improved prediction of pCR. Of all texture features assessed using MLA, only the classifier trained on pretreatment PET was significant (p = 0.034; accuracy, 92.8%). Combined PET and MR texture features did not improve performance. Conclusion: Combined PET-MR may improve specificity compared with PET or MR alone, although this needs to be validated in a larger cohort. Tumor to musCorresponding author.


Introduction
Pre-operative chemoradiation therapy (CRT) has now become the standard of treatment for locally advanced rectal cancer (LARC).Following pre-operative radiation, 10% -30% of patients have no residual tumour in the pathologic specimen following surgery [1].This is referred to as a complete pathologic response.Recently, several cohort studies have shown that patients achieving complete pathologic response have significantly improved long term survival compared with those patients who do not have a complete pathologic response [2] [3].Unfortunately, there has been no single reliable imaging modality to select patients who have had a complete pathologic response before surgery [4].If this was possible, some patients may be spared a radical resection and permanent colostomy.This would have a significant impact on management of rectal cancer and the quality of life for patients [5].
The current study was an exploratory prospective pilot study in patients with LARC in which qualitative and quantitative MR, PET, PET-MR and pre-and post-CRT PET and MR tumor texture features were correlated with pathology specimens to determine the accuracy of these imaging modalities and data analysis strategies in predicting complete pathologic response.

Patient Population
This is an institutional research ethics board approved prospective trial, and informed consent was obtained from all participants.Eligible subjects included patients referred to our institution with locally advanced rectal cancer (T3-T4, N0-N1) based on clinical and imaging studies that were deemed fit to have preoperative chemoradiation and surgery.Exclusion criteria included patients younger than 18 years of age, who had known distant metastasis, or who could not provide informed consent.Patient accrual was conducted over 2 years and completed in December 2013.The study schema is described in Figure 1.FDG PET/CT and pelvic MR were performed twice, at baseline and after CRT.PET and MR were obtained on same day in 19 of 30 scans (63.3%).The median time between PET and MR was 0 days (range 0 -24 d, mean: 3.4 d).All patients received pre-operative chemoradiotherapy which consisted of 50 Gy in 25 fractions with either infusional 5-FU (225 mg/m 2 /day continuously during radiotherapy) or capcitabine (825 mg/m 2 /day on days of radiotherapy) as per institutional protocol.Surgery was at the discretion of the treating surgeon and included a partial total mesorectal excision for upper rectal cancers and the total mesorectal excision for mid and low rectal cancers.

PET Data Acquisition
All whole body PET scans were performed in 3D mode with a dedicated in-line PET/CT scanner (Siemens, Biograph).Patients were asked to fast for at least 6 hours before undergoing the examination.Data was acquired 60 -75 minutes after an intravenous injection of approximately 5 MBq/kg body weight of FDG (up to 550 MBq).First, a spiral CT scan from the neck to the pelvis was obtained using the following parameters: 130-kV peak; 105 mAs; scan width, 5 mm; and feed/rotation, 8.4 mm.Immediately on completion of the CT, PET scans of the same area were acquired for 3 min per bed position, with 5 -7 bed positions per patient.

MR Data Acquisition
Patient preparation included fasting, and self-administered fleet enema.MR was performed on a Siemens Avanto Fit 1.5T using an 18 channel body array coil, with the patient supine coil location confirmed with a 3 plane localizer.Patients were instructed to empty bladder prior to examination.
Intravenous antiperistaltic agent (20 mg hyoscine N-butylbromide; Buscopan) were injected prior to the exam.Standard contrast protocol included pre contrast axial VIBE fat supressed T1WI, administration of 0.1 mmol/kg body weight of Gadobutorol (Gadovist R ) followed by dynamic and delayed post contrast images.MR acquisition protocol is described in detail in Table 1.

Qualitative PET, MR, PET-MR Response Assessment
Qualitative assessment of PET and MR was performed by two separate experienced readers (UM, KJ).Review of each imaging modality was performed separately and response assessment scores were recorded.Both baseline and end of therapy scans were available for review for each modality.To minimize recall bias, at least 8 weeks after the initial, a combined review of PET and MR datasets (PET-MR) was performed by 2 readers with side by side images of baseline and end of therapy scans for both modalities, and the PET-MR response assessment score was recorded.1) MR: On post CRT T2 weighted MR, low signal, similar to that of the muscularispropria was interpreted as indicating fibrosis, whereas intermediate signal intensity indicated residual tumor.A 5-point MR tumor regression grade developed at the MERCURY trial was used [6] [7].The score was assigned according to the proportion of low signal intensity fibrosis and remaining residual intermediate signal intensity tumor: predominance of fibrosis with no or minimal residual intermediate tumor signal (grade 1 or 2, respectively); substantial tumor signal intensity but fibrosis dominates (grade 3); predominance of tumor signal intensity with minimal fibrosis (grade 4); no tumor regression from baseline (grade 5).Comparison to baseline imaging was performed to avoid misinterpretation of pseudotumor as residual tumor [6].Pseudotumor refers to inflammatory changes within normal rectal wall adjacent to regressed tumor.Rectal wall submucosal edema results in edematous thickened wall with intermediate T2 signal, simulating tumor, whereas the adjacent tumor may have regressed appearing less bulky with low signal intensity.
2) PET: A 3-point scale was used for grading PET tumor response after CRT.Post-therapy scans were compared to baseline to identify precise location of tumor.The degree of regression was assessed and scored as grade 1 if complete or near complete regression of focal uptake of FDG at tumor site with no discrete focal uptake; grade 2 if there has been partial response but discrete, focal residual uptake of FDG is identified at site of tumor; grade 3 if there has been no regression or progression of tumor.
3) PET-MR: A 3-point scale was used on PET-MR assessment.When PET and MR were concordant, the score remained unchanged.When MR or PET were discordant but one modality was definitive for complete response (e.g.MR showing low intensity crescent at site of tumor or no uptake was identified on corresponding location on PET), a TRG score of 1 was assigned Figure 2.

4) Pathology regression grade:
There have been several tumor regression grading systems proposed, yet none are universally accepted.The Mandard classification scheme originally introduced for grading of regression of esophageal cancer to CRT includes 5 scores: TRG 1, complete tumor regression with fibrosis and absence of residual cancer; TRG 2, fibrosis with scattered tumor cells; TRG 3, fibrosis and tumor cells with preponderance of fibrosis; TRG 4, fibrosis and tumor cells with preponderance of tumor cells; TRG 5, tumor without changes of regression [8].The Ryan tumor regression grading system is based on 3 scores indicating good, moderate and poor response to therapy: Mandard TRG 1 or 2 are scored as 1, Mandard TRG 3, is scored as 2 and Mandard TRG 4 or 5 are scored as 3 [9].
As Mandard TRG 1 and 2 have been regarded as complete pathological response (ypT0) and to enable comparison of the various tumor regression grades on the different imaging modalities, all regression grading systems (PET, MR, PET-MR and pathology) were converted to a 3 point scale analogous to the Ryan scale: TRG1, complete or near complete response (only microscopic foci of carcinoma on pathology with marked fibrosis); TRG 2, partial response (marked fibrosis but macroscopic disease present); TRG 3, no response or progression.On pathology, the modified 3-point grade has been shown to have similar prognostic significance as the 5-point scale [6].For MR, MR-TRG 1 was converted to TRG 1; MR-TRG 2 or 3 was converted to TRG 2; and MR-TRG 4 or 5 was converted to TRG 3.

Quantitative PET, PET-MR Analysis
Detection of residual tumor after CRT relies on accurate identification of tumor and may be greater if tumor to background ratio increases.To determine whether correcting for background on PET or DWI-MR may improve tumor detection, one reader (GM) obtained measurements of SUV (from attenuation corrected PET) and ADC (from ADC maps on MR) in all primary tumors before and after CRT.Three measurements obtained from the upper, mid and inferior portions of the primary tumor were obtained on PET and MR.Two backgrounds tissues were chosen: skeletal muscle (gluteal muscle), and normal colon wall, distant from tumor site.First, we calculated SUV ratios of tumor (average of the 3 measurements) to the 2 backgrounds (SUV of tumor to muscle [=SUV t/m]; SUV tumor to colon [=SUV t/c]).Similarly ADC ratios of tumor to the 2 backgrounds were computed.To test the hypothesis whether combining PET and DWI-MR may improve tumor detection, we assessed DWI parameters and combinations of PET and DWI parameters to determine if they perform better than SUV alone.Initially, we calculated SUV/ADC in primary tumors on pre and post-CRT scans Figure 3.As there are no known normal values for SUV/ADC, we corrected the SUV and ADC to the selected background reference tissues.The ratios of SUV/ADC in tumor to SUV/ADC in selected background tissue (termed SUV/ADCm; SUV/ADCc for skeletal muscle and colon, respectively) were recorded.We then compared the calculated ratios to SUV alone.The potential impact of a calculated ratio was assessed in relation to the standard of reference.Post-therapy, improved detection of residual tumor was considered if a calculated SUV/ADC ratio had a larger value than measured SUV tumor to background ratio in patients with TRG 2 or 3. Conversely for patients with complete pathologic response, a smaller calculated SUV/ADCratio compared to the measured SUV tumor to background ratio would be considered positively impacting lesion characterization.

Tumor Texture Analysis
After manual tumor segmentation on pre and post-therapy PET and MR, we used machine learning algorithms to assess tumor texture features for prediction of complete response to CRT.For pre-therapy PET analysis one dataset was omitted for technical reasons.A bank of 64 texture features was computed for each tumor based on the image voxels lying inside the tumor VOI, after quantizing the VOI's continuous voxel values into 32 equally spaced bins.Repeated experiments with 16 and 64 bins revealed the results were not sensitive to this parameter.Feature formulae originally published in 2D were adapted to the 3D VOIs by extending the set of four 2D nearest neighbor pixels to a corresponding set of 13 voxels on the 3D image lattice [10].A set of 11 first order features were extracted from the tumor intensity distribution in the form of percentiles spaced at 10% intervals.Four classes of second order texture features were computed from multidimensional histograms: (i) the mean and range of the 14 Haralick features computed from the grayscale co-occurrence matrix [11] taken over all 13 neighbor orientations [12]; (ii) five features based on the neighborhood gray tone difference matrix [8]; (iii) ten features from the gray level run-length matrix [13]; and (iv) the same ten features from the gray level size zone matrix [14].The 64 features were employed to train a radial basis function (RBF) support vector machine (SVM) classifier [15] to discriminate between the fully recovered patients and those with continuing disease.Accuracy was assessed under a leave-one-out cross-validation paradigm.Hyper-parameters representing the fraction of highest performing features to retain, the scale of the radial basis function kernel, and the SVM training algorithm cost parameter were optimized via 3D grid search over the held-in data only for each CV iteration.The null hypothesis of no relation between the subject labels (recovered vs non-recovered) and the texture features was tested by randomly permuting the labels a total of 3,000 times and computing the fraction of permuted runs achieving an area under the ROC curve (AUC) equal to or better than the AUC from the correctly labeled data.

Statistical Analysis
The ability of each of the methods to predict complete response to neoadjuvant chemoradiotherapy was

Results
Eighteen patients with LARC were recruited for this trial, including 12 men and 6 women, with mean age of 59.8 years (range: 41 -82).Fourteen patients had surgical resection of primary tumor and surgical pathology served as the standard of reference.One patient did not undergo surgery, due to local and distant disease progression after neoadjuvant CRT.For purpose of final analysis, the latter patient was considered as having disease progression (TRG 3).These 15 patients formed the cohort for response assessment analysis.Two patients had distant metastases on baseline PET and did not undergo surgical resection.A further patient did complete the second set of imaging studies.

Qualitative PET, MR and PET-MR
A summary of the performance measures of PET, MR and PET-MR are presented in Table 2, with overall accuracy of 80%, 66.7%, 86.7%, respectively.These differences did not reach statistical significance (p = 0.625 for PET vs MR; p = 0.25 for PET-MR vs MR; p = 1 for PET vs PET-MR), likely due to the small sample size.Assuming a power of 80% (alpha, 0.05), the sample size required to detect a statistically significance difference between the accuracies for MR and PET-MR, would be at least 45 patients and to detect a statistically significant difference between PET and MR would be 127 patients.

Quantitative PET-MR
Tumor to background ratios were assessed on PET and DWI using skeletal muscle (m) and using normal colon (c) as reference.The results for preoperative imaging are summarized in the Table 3. Tumor to background ratios improved using (SUV/ADC)t/m for 14 of 18 patients (77.8%) and using (SUV/ADC)t/c for 13 of 18 patients (72.2%).The greatest improvement in tumor to background ratio was achieved using (SUV/ADC)t/m with a median increase in 32.25% over SUVt/m alone.
Post-therapy ADC measurements and tumor to background ADC ratios as a function of therapy response are presented in Table 4, showing overlap between the 2 groups for all 3 parameters assessed (Figure 4).
Combined PET and MR post-therapy datasets are presented in Table 5.Compared to the standard of reference, change in tumor to background on (SUV/ADC)t/m had a positive impact in 9 of 14 patients.For the 5     patients with complete pathologic response, SUV/ADCt/m showed decreased values (better prediction of complete response) in 4 of 5 patients.(SUV/ADC)t/c had a positive impact in only 7 of 14 and no significant improvement in any of the complete responders.

Texture Analysis
Out of the seven classifiers trained on single modality data (PET, ADC, pre-contrast and 4 x post-contrast MRI) both before and after treatment, only the classifier trained on the pre-treatment PET scans was able to reject the null hypothesis (p = 0.034) of no association between labels and texture features Figure 5.This classifier had an AUC of 0.9394 and an accuracy of 92.8% for response assessment using these texture features prior to therapy, as compared to 86.7% for combined post-therapy PET-MR interpretation (p = 0.25).There were two possible operating points along the ROC curve, one with a sensitivity/specificity of 81%/100%, and the other 100%/67%.While the classifier itself was statistically significant, no single texture feature was significant according to a Wilcoxon rank sum test adjusted for multiple comparisons using a false discovery rate of 0.05.Combining the PET texture features with any of the MRI features in either pre-treatment or post-treatment scans consistently produced inferior classifier performance.

Discussion
Accurate preoperative prediction of complete response to neoadjuvant CRT in patients with locally advanced rectal cancer may have clinical implications, as these patients may be spared the morbidity of radical resection and permanent colostomy, improving quality of life and optimizing utilization of health care resources.This may be especially important in patients with comorbidities that may put them at high risk of mortality from surgery.
A recently published meta-analysis comparing FDG-PET to DW-MRI in the prediction of pathological response to preoperative neoadjuvant therapy in patients with rectal cancer has suggested that MR may have higher sensitivity than FDG PET (85% and 81%, respectively [p < 0.05]), but specificity was similar for both (77% for PET and 73% for DW-MRI) [16].Although this data pertains to prediction of response in general, it may not accurately reflect stratification of complete responders from others, which is the clinically relevant question in the current study.
In our study cohort, sensitivity of all modalities was similar but specificity of both PET and MR for detection of complete response was low (60% and 20%, respectively).Interestingly, combination of MR and PET data in individual patients improved the specificity (80% on combined PET-MR) over either modality alone.Spatial coregistration of PET and MR enabled better characterization of abnormal FDG uptake on PET as there was better localization of metabolic activity to residual mass (indicating possible residual tumor) versus to adjacent segment of rectum (indicating radiation-induced inflammation).Furthermore, abnormal uptake of FDG on PET or abnormal signal intensity or persistent restricted diffusion on MR could be further characterized as probable false positive in patients with findings of complete response on MR or PET, respectively Figure 3. Assessment of PET and MR data will likely be further improved on hybrid PET/MR scanners, recently introduced to clinical practice [17].As PET data and at least one MR sequence can be acquired simultaneously, lesion coregistration may be significantly improved, especially for organs which may change their shape and location over short periods of time such as bowel.
Given the inherent limitations of PET and MR in predicting complete response, we chose to test different imaging models to improve therapy response stratification and especially whether patients had a complete pathologic response.One imaging algorithm was to combine PET and DWI data, previously termed parameteric fusion PET/MR.In a previous study, Park et al. had shown that for prostate cancer, parameteric fusion maps obtained from 11 C-choline PET/MR may improve tumor to background ratio [18].As SUV and ADC are reported using different scales and in order to standardize the impact of combining information from different scans and different patients we corrected each dataset (PET, DWI) with reference tissue.This strategy enabled assessment of the impact of combining data on PET and DWI on the overall tumor to background ratio.In our pilot, we found that parameteric fusions data improves tumor to background ratios for the majority of patients with a median increase in over 30% over SUV ratios alone.Furthermore, after CRT, parameteric fusion PET/MR using skeletal muscle as reference may further improve assessment of complete responders to therapy in 4 out of 5 cases by decreasing signal in residual mass (and tumor to background ratio) as compared to SUV alone.
Another strategy tested in this pilot was evaluating texture features on MR and PET before and after therapy to determine whether these may improve prediction of response to therapy.Radiomics is a rapidly evolving research field which refers to extraction and analysis of large volume quantitative imaging data from medical images in a minable form to build predictive models relating texture features to phenotypes, or genetic and proteomic signatures [19] [20].The purpose of this is to obtain valuable diagnostic, prognostic and/or predictive information regarding a disease from existing imaging studies [21].Prior studies have shown that texture features obtained from baseline FDG PET in patients with esophageal cancer undergoing neoadjuvant CRT were the best predictors of pCR [10].In the current study we used machine learning technology which seeks to produce classifiers capable of making accurate predictions from input data directly, without any reference to underlying mechanisms or prior knowledge of the domains to which they are applied.It was hypothesized that tumor texture features will be able to predict response to therapy in patients with LARC.Although our cohort is small, classifiers trained on pretherapy PET were the only ones that showed significance in predicting complete response to CRT, with an overall accuracy of 92.8%.
The limitations of this prospective pilot study include its relative small sample size, with only 15 patients with complete pre and post-therapy PET and MR, and PET and MR data was not obtained with a hybrid, in-line scanner.However, same day scanning was achieved in the majority of studies (63.3%), increasing likelihood of appropriate data registration.Nonetheless, we have shown improved specificity for combined PET/MR interpretation as compared to either modality alone, and have provided estimates of sample size for future prospective trials.We have also tested two innovative strategies to better stratify patients' response to therapy.Parametric fusion PET/MR previously introduced for prostate cancer appears to also improve tumor to background ratios before and after CRT and decreases signal to background ratio in most patients with complete pathologic response, potentially resulting in improved prediction of complete response to CRT.

Conclusion
In conclusion, PET-MR may perform better than PET and MR alone in stratifying patients with LARC into complete response versus incomplete response to CRT.Advanced imaging parameters including parameteric fusion PET-MR and texture features on pretherapy PET may further improve prediction of therapy response, potentially impacting patient management and outcome.These data need to be validated prospectively in a larger cohort of patients, preferably acquired on a hybrid PET-MR scanner.

Figure 2 .
Figure 2. (a) 71-year-old man with T4 rectal tumor.Pretherapy PET/CT shows metabolically active tumor in rectum; (b) Corresponding axial T2 weighted MR shows intermediate signal intensity mass; (c) Post-therapy PET/CT shows residual uptake suspicious for residual tumor (arrow); (d) Corresponding axial T2 weighted MR image shows complete response with a crescent of mucosal low signal intensity indicating fibrosis (dotted arrow).PET-MR findings were interpreted as negative for residual tumor.Surgical pathology confirmed pCR.

Figure 3 .
Figure 3. 48-year-old man with low rectal tumor, pretherapy PET (left), inverted ADC map (middle), and SUV/ADC map (right) show anterior wall rectal tumor.Tumor to background ratio is increased on SUV/ADC map compared to PET alone.

Figure 4 .
Figure 4. DWI-based parameters as predictors of pCR.(a) Scatter plot of ADC as a function of response assessment; (b) Scatter plot of ADCt-m as a function of response assessment; (c) Scatter plot of ADCt-c as a function of response assessment.

Figure 5 .
Figure 5. Performance of texture analysis for PET.Dark blue: Pre-treatment PET quantiles only (p = 0.034).

Table 1 .
MR imaging protocol for rectal cancer staging/restaging.

Table 2 .
Performance measures of PET, MR and PET-MR.

Table 3 .
Tumor to background ratios for measurement obtained at baseline imaging (SUV, ADC, ratios and difference between SUV alone and SUV/ADC ratios).
t = tumor; m = skeletal muscle; c = normal colon; SUV = semiquantitative uptake value; ADC = apparent diffusion coefficient.Δ (t/m) or (t/c) denotes the difference between SUV (t/m) or (t/c) and SUV/ADC (t/m) or (t/c), respectively.A negative value means decrease in tumor to background ratios whereas a positive value means improvement in tumor to background.

Table 4 .
ADC measurements and ADC tumor to background ratios in patients with complete response to CRT (TRG 1) and all others (TRG 2 & 3).

Table 5 .
Tumor to background ratios for measurement obtained after neoadjuvant chemoradiotherapy (SUV, ADC, ratios and difference between SUV alone and SUV/ADC ratios).
t = tumor; m = skeletal muscle; c = normal colon; SUV = semiquantitative uptake value; ADC = apparent diffusion coefficient.Δ (t/m) or (t/c) denotes the difference between SUV (t/m) or (t/c) and SUV/ADC (t/m) or (t/c), respectively.A negative value means decrease in tumor to background ratios whereas a positive value means improvement in tumor to background.