Validation of Treatment Planning Dose Calculations : Experience Working with Medical Physics Practice Guideline 5 . a

Recently published Medical Physics Practice Guideline 5.a. (MPPG 5.a.) by American Association of Physicists in Medicine (AAPM) sets the minimum requirements for treatment planning system (TPS) dose algorithm commissioning and quality assurance (QA). The guideline recommends some validation tests and tolerances based primarily on published AAPM task group reports and the criteria used by IROC Houston. We performed the commissioning and validation of the dose algorithms for both megavoltage photon and electron beams on three linacs following MPPG 5.a. We designed the validation experiments in an attempt to highlight the evaluation method and tolerance criteria recommended by the guideline. It seems that comparison of dose profiles using in-water scan is an effective technique for basic photon and electron validation. IMRT/VMAT dose calculation is recommended to be tested with some TG-119 and clinical cases, but no consensus of the tolerance exists. Extensive validation tests have provided the better understanding of the accuracy and limitation of a specific dose calculation algorithm. We believe that some tests and evaluation criteria given in the guideline can be further refined.


Introduction
Commissioning a commercial treatment planning system in radiation oncology includes two major tasks: modeling the beam data and validating the accuracy of the models.An overall accuracy of 5% in the delivery of absorbed dose [1] is recommended by the International Commission on Radiation Units (ICRU) and the accuracy of 2% in the computed dose distribution [2] is suggested by American Association of Physicists in Medicine (AAPM).
Recently, AAPM has published a medical physics practice guideline (MPPG 5.a.), [3] which sets the minimum requirements for commissioning and QA of treatment planning dose calculations.The required validation process is described in MPPG 5.a. in the following Sections: treatment planning system v. 9.10 for three linacs, including one TrueBeam, one 2100EX, and one Elekta Infinity.We followed MPPG 5.a.for the validation tests.
We attempted different measurement techniques in the validation tests.We would like to present our experience and results here for the validation of Pinnacle treatment planning dose calculation.Modeling of beam data and validation of a dose calculation model involve in-depth knowledge of treatment machine, dose calculation algorithm and dosimetric data measurement, which have been studied extensively (see the references in MPPG 5.a.).The scope of this paper is to provide the first experience of implementing MPPG 5.a.and the related discussion about testing methodologies.

Materials and Methods
Modeling of Collapsed Cone Convolution (CCC) dose algorithm in Pinnacle planning system followed the vendor's instruction for the beam data collection.5.9 Large (>15 cm) field for each nonphysical wedge angle Those tests were performed by comparing the absolute dose at various POIs between measurement and calculation.We scanned dose profiles in water at three SSDs (80 cm, 100 cm and 120 cm) and four different depths (2 cm, 4 cm, 12 cm and 25 cm) using IBA cc13 ion chamber and Blue water phantom.Table 1 shows the positions of ion chamber at varied SSDs and depths.The absolute Table 1.Summary of the MLC-shaped field tests performed using in-water profile scan with SSD, depth and chamber positions relative to the isocenter.dose at each point of the measured profile was converted from the charge signal using the ratio to that of the dose calibration.Each specified dose profile was calculated in the planning system using a virtual water phantom (50 cm × 50 cm × 50 cm).The resolution of the dose profiles was 2 mm for both calculation and measurement.All six tests (5.4 -5.9) were carried out based on the suggestion from MPPG 5.a.Our experience has shown that test 5.7 can be incorporated in test 5.8 using an irregular/asymmetric MLC field and test 5.9 can be designed by the same MLC field as in test 5.5 with the wedge angles of interest added.

Modeling parameters in
Therefore, the experiment for all six suggested tests would be focused on such MLC fields as illustrated in Figure 1.

Photon Beams: Heterogeneity Correction Validation
The recommended test by the guideline for the accuracy of dose calculation through the heterogeneous media is the beam delivered to low-density material by a small field size (5 × 5 cm 2 ).We employed a CIRS thorax phantom (Model:

Photon Beams: IMRT/VMAT Dose Validation
Five types of validation tests recommended for IMRT/VMAT delivery modalities are summarized below.
Figure 2. CIRS thorax phantom with ion chamber and beam configuration.
7.1 Verify small field PDD, using a small detector such as diode or plastic scintillator 7.2 Verify output for small MLC-defined fields, using a small detector 7.3 TG-119 tests, using both ion chamber and array detectors with appropriate & N phantom [5] was verified by film and TLD.

Electron Beam Validation
The recommended tests for electron beam validation includes comparing the isodose distribution for a custom cutout, for an obliquely incident beam and for heterogeneous media.We performed the tests for a custom cutout and an obliquely incident beam using in-water scanning of profiles at different SSDs and depths.Figure 3 illustrates the setup of an oblique electron beam with 10 × 10 open cone at 30˚ gantry angle.The validation of dose calculation in heterogeneous media for electron beams can be tested using a piece of film sandwiched in between two thin slabs of styrofoam with solid water slabs place on the top and bottom as buildup and backscatter.The accuracy and the limitation of Pencil beam algorithm are well known and discussed elsewhere [6].We wouldn't include the discussion of our results in this publication due to the fact that MPPG 5.a.doesn't consider Pencil beam as a good choice of algorithm for dose calculation in heterogeneous media.
Comparison between calculation and measurement is all given as absolute dose in cGy.For measurements using ion chamber, charge reading was converted to dose simply by the ratio to the TG51 calibration in water.The issue of the charge-dose conversion from different media is addressed in the discussion section.For measurements using films, dose was calculated with the calibration curve of the same batch of Gafchromic film following the manufacturer's instruction.

Photon Beams: Basic Dose Algorithm Validation
Dose comparison at POI was done by plotting out calculated and measured profiles at the specified SSD and depth as the absolute dose for the delivery of 100 Overall agreement between calculation and measurement is consistent with the models by a visual inspection for all test fields, i.e., the agreement for high dose region (in-field and shoulder) varying with depth and energy.Penumbra region agrees well taking into account the setup uncertainty in measurement (with the correction of any offset less than 3 mm).The disagreement is also easily identifiable with the difference curve or the numeric result.In general, all tests met the tolerances given by the guideline (see Table 5 in reference 3) with no substantial disagreement.Only for test 5.8 (an oblique MLC beam) were some

Photon Beams: Heterogeneity Correction Validation
The results are summarized in Table 2.We have seen the good agreement of the point dose between the calculation and the measurement for beams through different heterogeneous media in this test.The dose measurement between ion chamber and film is also consistent.The recommended procedure of MPPG 5.a. is to compare the ratio of dose above and below heterogeneity along the central axis.The comparison of an absolute dose at POI should be sufficient to show the dose calculation accuracy of the commissioned algorithm in heterogeneity, which might be considered as a less precise yet stricter approach by End-to-End test.A more precise test would clearly also pass as long as the POI test passes.

Photon Beams: IMRT/VMAT Dose Validation
We did the measurement of PDD and output for small MLC shaped fields from 1 × 1 cm 2 to 5 × 5 cm 2 using a diode.The difference between calculation and measurement was all within 3%.We passed the IMRT QA tests for TG-119 cases with both MapCHECK and ion chamber measurement.For both Elekta Infinity and TrueBeam, our IROC H & N phantom test had the pass rates over 90% on the Gamma Index of 7% and 4 mm and the TLD dose-error within 4%.With our TrueBeam, both IMRT and VMAT QA using ArcCHECK have had greater than 95% pass rate on the Gamma Index of 2% and 2 mm, including large field GYN cases (Y jaw ~ 35 cm), which are attributed to the quality beam data and fine models.With our Elekta Infinity, unfortunately, there was an issue for the initial model that about 50% VMAT cases failed on 2%/2mm Gamma Index on Arc-CHECK (Pass rate less than 90% even on 3%/3mm with 10% threshold, global gamma index and measurement uncertainty off) although majority IMRT cases could pass 90%.Interestingly, the agreement for the basic photon beam and MLC tests on Elekta Infinity was similar to or better than that on TrueBeam.
After exhausting the investigation of planning and measurement technique, we had to tweak and update the model with new beam data (re-measured with a diode instead of an ion chamber) for small MLC fields, which was able to pass all patient specific QA on VMAT.The profiles of new beam data appear slightly sharper on penumbra and lower tails.Detailed analysis and resolution of this finding will be presented as a separate study in conjunction with QA measurement techniques.

Electron Beam Validation
Figure 6 shows both in-plane and cross-plane dose profiles for the oblique electron beam.There exists a sizable disagreement in the high dose region (in-field) between calculation and measurement.The impact of central axis tilt on depth

Discussion
The basic TPS photon beam evaluation methods and tolerances recommended by MPPG 5.a.are 2% with one parameter change or 5% with multiple parameter changes on relative dose in high dose region; 3 mm distance to agreement (DTA) in penumbra region and 3% of maximum field dose in low-dose tail [3].
Validation tests by the comparison of absolute point dose have the advantage to identify any detailed discrepancies and to provide the confidence in End-to-End results.The most probable errors would be the accuracy of measurement techniques including the setup.The centering of the ion chamber can be easily corrected by the scanning software.With carefully performed measurements, we should be able to reveal the limitations of either measurement or calculation.For example, a spike seen in the calculation at deeper depths for Test 5.8 might be related to the scatter from a couple of the protruding MLC leaves toward the center of the field.But, this feature is not resolved in the measurement due probably to the effect of ion chamber volume average.Diode has higher spatial resolution but we have failed in obtaining a smooth profile desired even with a slow-speed or point by point scan due likely to the bad signal to noise ratio (SNR) at a deep depth.More efforts are encouraged with quality diode/electrometer or films to see if such a fine feature as observed in calculation can be resolved in measurement.
Accuracy of the dose measurement [7] is subject to a number of factors, in- may change a couple of percent over the depth range to R50 particularly for higher electron energies.We see little effect of non-linear relationship between charge and dose as the ion chamber measurement is pretty much identical to that of a diode (Figure 6(b)), which is not depth dependent.The difference between an ion chamber and a diode in an electron beam is expected to be small in profiles due to the volume averaging effect and the real difference due to stopping power effect is in depth ionization vs.
dose.We do observe the sizeable disagreement for in-field dose particularly for in-plane profiles between the measurements due mainly to the different calibration and response between ion chamber and diode.There is a subtle difference on the shoulder of the cross-plane profiles between calculation and measurement, which might be explained as the limitation of the pencil beam algorithm related to source modeling.The imperfectly constructed source distribution can cause the deviation on shoulder/penumbra region.Additional measurements can be performed in the future to investigate the effect of depth dependent of electron beam energy and the accuracy of electron dose calculation at oblique angles.
The dose calculation in homogeneity media was all performed using a virtual water phantom within the planning module, which is technically acceptable.
MPPG 5.a.might have suggested a CT-based phantom with bulk water density, to simulate the clinical use of the system.With heterogeneity correction turned on in calculation, some ≥ 0.5% difference can be observed between the phantoms (water vs. medium), e.g., the dose at depth of 10 cm under reference conditions.As pointed out by the guideline, some heterogeneity dose calculation algorithms (e.g., Monte Carlo and GBBS) directly calculate dose to the material within the voxel ("dose to medium").This can be converted to "dose to water" through application of stopping power ratios, with the goal of reproducing conventional (e.g., C/S) TPS doses.[8] However, this stopping power-based conversion has actually been found to decrease dosimetric agreement with conventional TPS doses in most cases [9] [10] leading to "dose to medium" being recommended [9].
IMRT/VMAT dose validation has the least amount of consensus amongst medical physicists and is controversial.Despite widespread IMRT utilization, accurate dosimetric commissioning of an IMRT system remains a challenge.In the most recent report from IROC Houston [5], only 82% of the institutions passed the credentialing end-to-end test with the anthropomorphic head and neck phantom, and the conclusion was [11] that institutional QA results were not correlated to the unacceptable plan delivery.That IROC test used rather lenient dose-ratio and distance-to-agreement (DTA) criteria of 7% and 4 mm, respectively.Only 69% of the irradiations passed a narrowed TLD dose-error criterion of 5%.There is a question of sensitivity and reliability about specific IMRT/VMAT QA dosimeters and analysis methods.In the validation of our Elekta Infinity, however, the problem was other way around where we passed the IROC head and neck phantom test well but failed in patient specific QA for about 50% of clinical VMAT cases.We believe that a substantial amount of the

Conclusion
The Our validation tests have provided a couple of clinical implications that a VMAT model needs to be carefully tested for varied planning cases and electron beams using pencil beam algorithm have the limited accuracy for oblique incidence and heterogeneity media.On top of all, the uncertainty and efficiency of measurement should be well understood.The experience presented is a learning process about how the validation tests can be performed effectively for a dose calculation model.

5 .
Photon beams: basic dose algorithm validation; 6. Photon beams: heterogeneity correction validation; 7. Photon beams: IMRT/VMAT dose validation; 8. Electron beam validation.The guideline has suggested some validation tests and the evaluation criteria in each validation section (basic photon, heterogeneity, IMRT/VMAT and electrons).Verification has to take into account measurement accuracy on top of model limitations to understand the goodness of a model.MPPG 5.a.doesn't specify the choice of the measurement technique to the user for those tests, but it states "Water tank profiles yield the most accurate absolute dose comparison, while array detectors can test multiple points wsithin the distribution and provide efficient comparison to calculations."This work was done at MD Anderson Cancer Center at Cooper for the commissioning of Pinnacle (Philips Radiation Oncology Systems, Fitchburg, WI)

2 . 1 .
Pinnacle are adjustable for separate regions in depth dose, buildup, in and out of field, which are used to model photon spectrum, electron contamination, flattening filter attenuation, effective source size, flattening filter scatter source, respectively.Jaw and MLC leaf transmission factors are also the modeling parameters instead of the exact values of measurement.For all basic validation tests, comparison of absolute dose between measurement and calculation for each point of interest (POI) is performed.An IBA (IBA Dosimetry GmbH, Schwarzenbruck, Germany) ion chamber cc13 was used in the measurement for photon basic dose algorithm, heterogeneity correction and a PTW (PTW-Freiburg, Freiburg, Germany) E type diode used for electron beam, and Sun Nuclear (Sun Nuclear, Melbourne, FL, USA) diode array (Arc-CHECK) for photon IMRT/VMAT validations.Photon Beams: Basic Dose Algorithm Validation Tests 5.1 -5.3 are the traditional verifications of percent depth dose (PDD), profiles and output at nominal source to surface distance (SSD), which are essential to check the agreement of the model with the commissioning data.In addition, MPPG 5.a.recommends five other tests as summarized below for basic photon beam validation in homogeneous media with static MLC fields.5.4 Small MLC-shaped field (non SRS) 5.5 Large MLC-shaped field with extensive blocking (e.g., mantle) 5.6 Off-axis MLC shaped field, with maximum allowed leaf over travel 5.7 Asymmetric field at minimal anticipated SSD 5.8 10 × 10 cm 2 field at oblique incidence (at least 20˚)

Figure 1 .
Figure 1.Beam's Eye View of the static MLC fields, (a) small non-SRS MLC-shaped field (Test 5.4); (b) large MLC-shaped field with extensive blocking (Test 5.5 or Test 5.9 with wedges); (c) off-axis MLC-shaped field with maximum allowed leaf travel (Test 5.6); (d) irregular MLC-shaped field (Test 5.7 at nominal gantry angle or Test 5.8 at oblique incidence).

resolution 7 . 4 7 . 5
Clinical tests, using both ion chamber and array detectors with appropriate resolution External review, various options such as IROC Houston anthropomorphic phantoms We performed the validation of PDD and output for small MLC shaped fields with a diode.IMRT plan and QA test from TG-119 (prostate and C-shaped target)[4] was done with MapCHECK and an ion chamber in water slabs.Two representative clinical VMAT cases (lung and pelvis) were done by ArcCHECK with the ion chamber insert.IMRT QA tests were delivered beam-by-beam at the nominal gantry and also compared of the composite dose with ion chamber at high and low dose region respectively.End-to-End VMAT test with IROC H

Figure 3 .
Figure 3. Test 8.2-Open cone at oblique beam and/or extended SSD.

Figure 4 .
Figure 4. Test 5.8 for 6 MV and 10 MV at SSD of 80 cm, in-plane profiles in absolute dose (cGy).Blue line: measurement; red line: calculation and green line: difference.

Figure 5 .
Figure 5. In-plane profiles (Y direction) calculated at SSD of 80 cm and depth of 25 cm for Test 5.8., which are plotted at the varied offsets along cross-plane (X direction, X = 0 at CAX).Left panel is 6X and right 10X.

Figure 6 .
Figure 6.(a) Test 8.2: Electron cross-plane (Lt panel) and in-plane (Rt panel) profiles in absolute dose (cGy).X axis is the distance relative to the central axis (X = 0) at specified depth, the upper panel for 9 MeV at depth of 2.5 cm and the lower panel for 20 MeV at depth of 5.0 cm.Red line: Pinnacle calculation; blue line: ion chamber measurement; green line: diode measurement.(b) Cross-plane profiles of ion chamber measurement (dashed line) scaled to that of diode measurement (solid green).The left panel is for 9 MeV at depth of 2.5 cm and the right panel is for 20 MeV at depth of 5.0 cm.
cluding but not limited to, calibration and response of a detector, measurement setup, and conversion of signal reading to absorbed dose, etc. Ion chamber is quite a simple and accurate device for the measurement of absolute dose at a point.Question can be raised concerning about the conversion of charge reading to absorbed dose in different media.For photon beams, electron energy and thus stopping power ratios ( ) med air L ρ is not depth dependent therefore depth-ionization ≈ depth-dose.For electron beams, electron energy and thus stopping power ratios ( ) med air L ρ is depth dependent, therefore, when converting from depth-ionization to depth-dose, stopping power ratios ( ) beam heterogeneity correction validation tests, we derived the dose from the in-water calibration.So, the dose to the solid phantom should take into account the conversion from dose-to-water to dose-to-muscle, which is about 1% difference as 0 failures in IMRT/VMAT validation are related to the fundamentals of the TPS commissioning.Our experience showed that acquisition and modeling of small MLC fields, particularly for the tail region, are critical to the IMRT/VMAT model.The issue might be related to the leaf gap model in the MLC configuration of this particular Elekta Linac.Detailed discussion of IMRT/VMAT QA criteria is beyond the scope of this article, but we have had further investigation underway to better understand the correlation of the criteria of validation tests with any potential deficiency of the model.
extensive validation tests recommended by MPPG 5.a.are meant to understand the accuracy and limitations of a dose algorithm commissioned before it's implemented in clinic.The MPPG 5.a.adapted the evaluation methods and tolerances for most validation tests from published AAPM task group reports and the criteria used by IROC Houston.Evaluation methods need to be explored further in relation with the refinement of a model and the optimization of the recommended testing methodologies.Validation tests for IMRT/VMAT are quite independent of those for basic photon beams, and hopefully some tests can be developed for direct diagnosis of any deficiencies in IMRT/VMAT delivery.

Table 2 .
Measured and calculated point dose for heterogeneous CIRS phantom.

Table 3
is made to have a brief view of our results from all the validation tests we performed.Readers are referred to the text of this article and MPPG 5.a.for the details.In general, all the measurements are reliable and repeatable as well as consistent when compared with the same type of machines, e.g., Varian 2100EX vs. TrueBeam.

Table 3 .
Summary of the results for our validation tests in comparison with the recommended evaluation criteria.