Dosimetric Effects Due to Inter-Observer Variability of Organ Contouring When Utilizing a Knowledge-Based Planning System for Prostate Cancer ()
1. Introduction
Prostate cancer represents the single most common individual cancer site for men with an estimated 174,650 new cases and 31,620 deaths yearly, which represents over 20% of the cancer burden for males [1]. Treatment for prostate cancer typically consists of active monitoring, surgery, radiotherapy, or some combination of these approaches [2]. Volumetric modulated arc therapy (VMAT), a new delivery modality, has achieved quick and widespread clinical acceptance as standard of practice for the radiation treatment of prostate cancer due to its improved organ-at-risk (OAR) sparing and high dose conformality to the target, which creates the opportunity for increased daily prescription dose and the improved therapeutic ratio [3] [4]. Studies have shown a correlation between plan quality and patient outcomes, with a definitive benefit to those techniques that support dose escalation while simultaneously limiting high rectal dose to avoid toxicities [5] [6] [7] [8].
Accurate delineation of targets and OAR structures is widely believed to be a foundation of successful radiation treatment on the basis that quality organ segmentation affects patient outcomes [9]. Studies have used different geometric metrics to evaluate contour variability, and have used these as indicators of plan quality [10] [11]. Popular geometric metrics for contour uncertainty evaluation include overlap index (OI), dice similarity coefficient (DSC), and centroid-to-centroid distance (DC), etc. [12] [13]. While these metrics are simple to compute, they lack any true clinical meaning, as they do not account for target and OAR interaction nor is there an established correlation with dosimetric effects or patient outcome [14]. One method to better understand the effect contour variation could have on dosimetry and outcome is the dosimetric comparison of calculated radiotherapy treatment plans based upon a user-defined test structure set (TSS) and an expert gold-standard structure set (ESS) [14] [15].
Until recently, the unique nature of creating individual inverse radiation treatment plans made the production of plans onerous and inconsistent, as the plan optimizer could not be systematically and dynamically adjusted without creating a unique, unreproducible result for a given structure set and optimization. Knowledge-based planning (KBP) is a relatively novel development designed to leverage dosimetric and geometric information from previous clinical plans to produce a model which is capable of estimating dose-volume-histogram (DVH) projections for a given set of contours [16] [17] [18]. This technique can reduce plan variation and optimization time in a manner typically associated with advanced treatment planner expertise while reducing planning time and producing consistent, high quality, and clinically acceptable treatment plans [19] [20]. Recently, new features of the Eclipse Scripting Application Programming Interface (ESAPI) have made it possible to generate a complete treatment plan without any user interaction [21]. Normally, the decision-making process in treatment planning is highly subjective and remains dependent on the knowledge, experience, and capability of the planner [22] [23]. A previous study showed the combination of KBP and automated scripting has the capability to produce exceptional treatment plans with minimal planner interaction for prostate cancer treatment [21]. The purposes of this study are to 1) investigate the geometric uncertainties of the target and OAR contours due to the inter-observer variability for prostate treatment; 2) obtain dosimetric endpoints from treatment plans automatically generated by a novel KBP based framework for plans that use gold standard structure set (ESS) and test structure sets (TSS) to evaluate the dosimetric impact of inter-observer contour variability on target coverage; 3) evaluate the correlation of contour uncertainty and planning target volume (PTV) dose coverage using the R2 statistics.
2. Methods and Materials
2.1. Patient Data
Twenty prostate cancer patients were selected for this retrospective study. All patients underwent computed tomography (CT) simulation in the supine position with Vac-Lok immobilization for the lower legs and a scan thickness of 2 mm from mid-abdomen to mid-thigh. The CT scans were imported into the Eclipse treatment planning system (TPS) for planning. A group of 5 experienced clinicians (2 medical dosimetrists, 3 medical physicists) was tasked with the segmentation of the following structures on the planning CT image: the prostate gland, seminal vesicles (SV), bladder, and rectum, utilizing best practices with full access to guidelines and supplemental resources. Each CT structure set was uniquely identified as TSS-1 to TSS-5 for each of the 20 patients. ESS was created by a contour imaging specialist who is responsible for all normal tissue segmentation in our clinic and an experienced radiation oncologist who specializes in prostate cancer. The imaging specialist generated the bladder and rectum, while the radiation oncologist delineated the prostate gland and SV. The final structure sets were then reviewed by the radiation oncologist and a clinical physicist for accuracy.
2.2. Treatment Planning
Both low-risk patients (LRP) and intermediate-risk patients (IRP) were included in this study, the clinical target volume (CTV) was prostate for LRP, and prostate + SV for IRP. Margins for the PTV were non-isotropic, with a 6 mm posterior expansion of the CTV and an 8 mm expansion in all other directions. To investigate the effect of dose distribution changes due to inter-observer contour differences, the following combination of structure sets were used to generate treatment plans: 1) ESStarget-ESSOAR; 2) ESStarget-TSSOAR; 3) TSStarget-ESSOAR; 4) TSStarget-TSSOAR. A previously discussed ESAPI based automated planning (AP) routine was used to automatically create treatment plans based on a given combination of structure sets [21]. The workflow of the AP routine is as follows: after a valid CT image and associated structure set are imported into the Eclipse TPS, a script is initiated to create a course, plan, and two VMAT fields (isocenter automatically assigned to the geometric center of the PTV) utilizing 358 degree arcs with collimator rotations of 355 and 85 degrees, respectively. The script generates or utilizes all appropriate names, reference points, prescription information, beam calculation model, and dose calculation setting. The AP routine utilizes a site-specific KBP model to generate initial DVH estimates for plan optimization. The KBP model was trained with a set of 80 prostate patients. Outliers were determined through separate data analysis and removed to ensure a robust model. Once optimized, the plan undergoes final dose calculation and is normalized such that 100% of the prescribed dose (78 Gy) is delivered to 98% of the PTV volume. The generated plan quality is then automatically evaluated by the AP script against our departmental dosimetric guidelines to ensure sufficient OAR sparing. Additional targeted optimization is conditionally performed and applied by the AP script if any constraint fails, and a final plan is produced once all criteria are met. In this manner, the AP routine was able to generate unbiased treatment plans for the different structure sets without any additional user input, interactive intervention, or post calculation adjustments that would occur with manual planning.
2.3. Geometric and Dosimetric Evaluation
The following geometric indices between the test and the gold-standard contour sets are utilized for concordance measures: 1) Centroid-to-centroid distance (DC); 2) Overlap index (OI); 3) Dice similarity coefficient (DSC). The dosimetric indices used for the plan evaluation were V78Gy (percentage of the volume receiving prescription dose) for the PTV, and mean dose (Dmean), V50Gy, and V60Gy (percentage of volume receiving 50 Gy and 60 Gy) for the OAR. In order to have a fair comparison, all generated plans received identical normalization. Although treatment plans were created with different sets of contours, the dosimetric evaluation was performed on the gold-standard contour sets.
The correlation between the geometric and dosimetric indices for the PTV was evaluated using R2 statistics for plans generated with different combinations of structure sets (TSStarget-ESSOAR and TSStarget-TSSOAR). In general, a correlation coefficient r between −0.2 and 0.2 indicates almost no relationship; r between 0.2 and 0.4 (−0.4 to −0.2) indicates a weak positive (negative) linear relationship; r between 0.4 and 0.7 (−0.7 to −0.4) indicates a moderate positive (negative) linear relationship; r between 0.7 and 1 (−1 to −0.7) indicates a strong positive (negative) linear relationship.
To investigate the impact of target coverage upon the inter-observer contour uncertainty, a paired Student’s t-test was used to test the PTV dose coverage between plans created with ESStarget-ESSOAR and TSStarget-ESSOAR/TSStarget-TSSOAR, with statistically significant results based upon a p-value < 0.05. A paired Student’s t-test was also used to test the difference in OAR dosimetry between plans created with ESSTarget-ESSOAR and ESSTarget-TSSOAR.
3. Results
3.1. Patient Statistics
The average volume for the CTV was 56.2 ± 18.8 cc (ranged from 28.4 cc to 88.4 cc) for LRP and 71.4 ± 23.5 cc (ranged from 34.2 cc to 110.5 cc) for IRP. Non-uniform margins yielded average PTV volumes of 138.1 ± 34.3 cc (range from 80.9 to 191.3 cc) for LRP and 183.6 ± 43.8 cc (range from 109.4 to 262.7 cc) for IRP. Rectal mean volume was 86.6 ± 35.9 cc with a range of 45.8 to 172.5 cc, while the mean bladder volume was 175.4 ± 77.1 cc, ranging from 83.0 to 386.1 cc.
3.2. Geometric Evaluation
The geometric evaluation was performed by calculating the OI, DSC and Dc between the test structure sets (TSS-1 to TSS-5) and gold-standard structure sets (ESS). Table 1 lists the average and standard deviation of the various geometric measures of the PTV and OAR for both the LRP and IRP groups. Figure 1 plots
![]()
Figure 1. The overlap index (OI), dice similarity coefficient (DSC) and centroid-to-centroid distance (DC) between the gold-standard and test PTVs for low-risk and intermediate-risk patient groups. (a) OI for LRP; (b) DSC for LRP; (c) DC for LRP; (d) OI for IRP; (e) DSC for IRP; (f) DC for IRP.
![]()
Table 1. The average overlap index (OI), dice similarity coefficient (DSC) and centroid-to-centroid distance (DC) for the PTV (LRP and IRP) and organs-at-risk between the gold-standard contour sets (EES) and test contour sets (TSS-1 to TSS-5).
the OI, DSC and DC of the PTV for each individual patient for both LRP and IRP groups, and the results for the bladder and rectum are shown in Figure 2. Noticeable geometric variations were found for the PTV (both LRP and IRP) and the rectum contours, while no substantial changes in bladder contours (<5% on average) were observed from our data sample.
3.3. Dosimetric Evaluation
Figure 3 shows ratios of the PTV dose coverage between test plans (TSStarget-ESSOAR and TSStarget-TSSOAR) and the gold-standard plan (ESStarget-ESSOAR) as a function of various geometric indices for LRP. The R2 statistic was used to study the correlation of dosimetric and geometric indices for the PTV dose coverage, and the resulting correlation coefficients (r) are shown in Figure 3 as well. Strong correlations between the PTV coverage and OI/DSC were observed for all test plans studied (r > 0.7), while moderate correlations were found between the PTV coverage and DC. The dosimetric and geometric correlations were also investigated for IRP, as shown in Figure 4. Strong correlations between the PTV coverage and OI were observed (r > 0.7), while moderate correlations were found between the PTV coverage and DSC/DC.
In this study we performed a paired Student’s t-test to investigate the statistical significance of the PTV coverage between plans created with ESStarget-ESSOAR and TSStarget-ESSOAR/TSStarget-TSSOAR. Compared to plans created with ESStarget-ESSOAR, statistically significant PTV dose reductions were found for plans created with TSStarget-ESSOAR and TSStarget-TSSOAR for both LRP and IRP groups (p
0.001), indicating that the inter-observer uncertainties in target delineation play an important role in achieving a high-quality treatment plan, and as such may affect the treatment outcome.
A paired Student’s t-test was also used to test the OAR dose differences between plans created with ESSTarget-ESSOAR and ESSTarget-TSSOAR, the resulting p-values are listed in Table 2 for LRP, and Table 3 for IRP, respectively. No statistically significant differences were observed in terms of OAR dosimetry (using Dmean, V50Gy and V60Gy as indicators) for both LRP and IRP groups (p > 0.05), which indicates the precision delineation of the bladder and rectum contours may not be necessary to achieve treatment plan with similar qualities.
4. Discussions
It is believed that the accurate contour delineation of the target and organs-at-risk
![]()
Figure 2. The overlap index (OI), dice similarity coefficient (DSC) and centroid-to-centroid distance (DC) between the gold-standard and test OARs. (a) OI for bladder; (b) DSC for bladder; (c) DC for bladder; (d) OI for rectum; (e) DSC for rectum; (f) DC for rectum.
![]()
Table 2. p-values from paried student’s t-test for organs-at-risk (OAR) between plans created with EESTarget-ESSOAR and EESTarget-TSSOAR structure sets for low-risk patient (LRP) group.
![]()
Table 3. p-values from paried student’s t-test for organs-at-risk (OAR) between plans created with EESTarget-ESSOAR and EESTarget-TSSOAR structure sets for intermediate-risk patient (IRP) group.
![]()
Figure 3. Correlation of the PTV coverages and geometric indices for low-risk patient (LRP) group. The PTV coverages were expressed as a ratio of test plan dose to gold-standard dose. The linear regression analysis is performed and shows the dashed lines together with fitted equations.
![]()
Figure 4. Correlation of the PTV coverages and geometric indices for intermediate-risk patient (IRP) group. The PTV coverages were expressed as a ratio of test plan dose to gold-standard dose. The linear regression analysis is performed and shows dashed lines together with fitted equations.
is essential in creating high-quality treatment plans for radiotherapy. However, manual contouring by clinicians, hereby referred to as “precision contouring”, is a time- and labor-intensive process, and subject to inter- and intra-observer variability. In this study we intended to focus on the inter-observer geometric uncertainties in structure definition of the target and OAR contours in prostate radiotherapy, and to investigate the impact of these inter-observer contour variances on target and OAR dosimetry. We found bladder segmentation to be very consistent between all clinicians (within 5%) due to it having well-defined borders which show clearly on planning CT scans and aid in structure delineation. Both geometric indicators (OI and DSC) showed greater variabilities for the target and rectal contours (~10%). This can be attributed to poor contrast between the region of interest and surrounding tissues.
The utilization of an automated planning routine is vital to this evaluation as it allowed us to generate unbiased high-quality treatment plans based upon a given structure set. The average time to produce a single plan from start to finish was approximately 20 minutes. Additionally, the AP script is capable of creating plans for 10 patients simultaneously. The AP routine does not require additional input from the planner on either the front or back end, which reduces uncertainties associated with the individual planner’s knowledge and experience.
The goal when developing a treatment plan in radiotherapy is to achieve adequate target coverage while sparing normal tissue as much as possible. PTV coverage was dosimetrically equivalent between plans created with the same gold-standard target contours (ESStarget) but different OAR contours (ESSOAR and TSSOAR). This was intuitively coherent as the AP planning routine accessed the same KBP model and used the same target contour, and the same plan normalization was used (100% of prescribed dose delivered to 98% of the PTV volume). However, compared with plans created with ESStarget-ESSOAR, statistically significant reductions of the PTV coverage were found for plans created with TSStarget-ESSOAR and TSStarget-TSSOAR (p
0.001 from paired Student’s t-test). This was because the plan was normalized to TSSTarget during the planning process, while all the dosimetric analysis was performed for ESStarget. The target under-coverage was purely due to the geometric variations between the ESStarget and TSStarget. For both the LRP and IRP groups, strong/moderate correlations were observed between the PTV coverage and geometric indices (OI, DSC and DC), which can be explained by the fact that the closer the test PTVs are to the gold-standard PTVs, the better the target coverage. In this study we did not compare OAR dosimetry for plans created with TSStarget and ESStarget, because OAR dose comparisons are of little use if the target coverages are not adequate. The OAR dosimetry comparisons between plans using the same target but different OAR structures were performed, and no statistically significant differences were observed for both bladder and rectum for both the LRP and IRP groups (Table 2 and Table 3, p > 0.05 from paired Student’s t-test). This is an indication that if a certain level of accuracy in bladder and rectum delineation is reached (~10% in our study), our automated planning routine can achieve similar plan quality for prostate treatment. Tight conformation of high dose to the target associated with VMAT planning helps the understanding of how changes in target definition will have a major impact on the plan quality, while the OAR dose appears to be relatively independent of moderate contour variations.
Attention must be given to the transfer of this work to other more complex disease sites. Compared to sites such as head and neck, prostate treatment planning considers relatively few critical OAR such as the bladder and rectum, both parallel organs with an emphasis on reducing the mid to higher dose levels. Furthermore, the positional relationship of the target to this OAR is relatively consistent across the prostate patients, which could be the reason why precision contouring of OAR has a minimum impact on the plan quality. More complicated anatomy, critical structures, geometric relationship between target and OAR, and serial organ involvement may produce different dosimetric and geometric interactions compared with those examined in this study.
5. Conclusions
Inter-observer variability of contours during structure segmentation was very low (less than 5% on average) for clearly defined organs such as the bladder but increased for organs with less well-defined borders (prostate, seminal vesicles, and rectum). The fully automated AP script routine was successfully utilized in the creation of standardized and unbiased plans for the evaluation of isodose distributions based upon contour disparity due to inter-observer variability. All the plans generated were normalized in such a way that 100% of the prescribed doses were delivered to 98% of the PTV volume, thus target coverage of plans created with ESStarget-TSSOAR was guaranteed. However, that is not the case for plans created with TSStarget. Strong/moderate correlations were observed between the PTV coverage and geometric indices (OI, DSC, and DC) for both the LRP and IRP groups, which is an indication that the closer the test PTVs are to the gold-standard PTVs, the better the target coverage is. For both the low-risk and intermediate-risk patient groups, the level of target delineation accuracy is crucial in order to maintain adequate dosimetric coverage, while there is the limited impact of inter-observer OAR segmentation in reference to final OAR dosimetry.
Thus, we propose that there may be limited benefit from human-based OAR contouring in prostate radiotherapy. It is recognized that this result is specific to the prostate and requires further study for more complex treatment sites.