Reliability and sensitivity to change of IW-TSE versus DESS magnetic resonance imaging sequences in the assessment of bone marrow lesions in knee osteoarthritis patients : Longitudinal data from the Osteoarthritis Initiative ( OAI ) cohort *

Background: Bone marrow lesions (BMLs) are associated with osteoarthritis (OA). We assessed the performance of two commonly used MRI sequences, IW-TSE and DESS, for reliability in the detection of BMLs and sensitivity to estimate change over time. We suggested that the IW-TSE would demonstrate higher sensitivity to change than DESS in the assessment of BML prevalence and change over time. This study was performed using a subset of the Osteoarthritis Initiative (OAI) cohort. Methods: A subgroup of 144 patients was selected from the OAI progression cohort who all had IW-TSE and DESS MRI acquisitions at baseline and 24 months. BMLs were assessed using a semi-quantitative scale in the global knee, medial and lateral compartments, and subregions. Intra-reader reliability was assessed on a subset of 51 patients. Results: Intra-reader reliability was substantial for the global knee ≥ 0.64, medial ≥ 0.70, and lateral ≥ 0.63 compartments for IW-TSE and DESS. The prevalence of BML detected at baseline was only slightly greater for IW-TSE compared to DESS. The mean BML score at baseline was significantly higher (p ≤ 0.006) for the IW-TSE than the DESS. However, mean change at 24 months was similar for both sequences for all regions except the medial compartment (p = 0.034) and medial femur (p = 0.015) where they were significantly higher for DESS than IW-TSE. Moreover, the prevalence of BML change at 24 months was similar in all regions except the global knee (p = 0.047) and the lateral tibial plateau (p = 0.031). Conclusion: This study does not suggest superior sensitivity to change of one sequence over the other for almost all the regions. The only difference is a higher BML mean change over time detected by the DESS sequence in the medial compartment and femur. These data bring into perspective that both sequences seem equivalent regarding their use for the assessment of BML in clinical trials.


INTRODUCTION
In magnetic resonance imaging (MRI) assessment of knee osteoarthritis (OA), subchondral bone has frequently been found to be the site of signal alterations reflecting the presence of lesions.These lesions were first thought to be only bone marrow edema, as they showed hypersignal in water-sensitive-weighted sequences [1].Notably, the same lesions could also be identified using a number of other sequences including T1-weighted sequences reflecting a more structural than edematous process [2][3][4][5].Therefore, the original terminology "bone marrow edema" was over time replaced by the more descriptive term "bone marrow lesion" or BML [6,7].These OA lesions may decrease in size, persist, increase in size, or be converted into cysts over time [5].
Conflicting opinions remain about BMLs and their significance.This may reflect the lack of complete knowledge regarding the nature of the lesions observed with different MR sequences.A recent study in hip OA correlating MRI with histopathology findings revealed a number of changes such as true bone edema, fibrosis, necrosis, subchondral fractures and cysts, all of which appear relatively similar in MR images [18].Several patterns were differentiated by subtle variations in signal intensity at 1.5 Tesla spin echo including T1-weighted, fat saturated (FS) T2-weighted, and post contrast T1weighted without FS spin echo sequences.Some reports [10,[19][20][21] suggest that for a better contrast definition of BMLs, specific MR acquisition sequences, especially water-sensitive sequences should be used preferentially while the use of T1-weighted sequences is advocated to better delineate bone cysts [5].However, ideally both T1-and water-sensitive sequences should be performed to completely appreciate the global joint structure.Yet, in the context of clinical trials this would increase the examination time, which has proven to decrease patient compliance as it results in increased incidence of failed examinations and dropout rate due to significant knee pain impeding prolonged joint immobilization.One might therefore question whether steady state gradient echo sequences with mixed T1/T2 * weighting might be a reliable and practical compromise for gathering information on both cartilage and BMLs at the same time.
A recent publication on cross-sectional findings in knee OA patients using the same MRI protocol as the Osteoarthritis Initiative (OAI) showed that BMLs appear larger in the water-sensitive intermediate-weighted turbo spin echo (IW-TSE) sequence compared to the findings obtained by using the dual echo steady state (DESS) [5].However, to date, there has been no head-to-head comparative study done in a sufficiently large OA patient population assessing BMLs with different MR sequences acquired during the same exam.The performance of a steady state gradient echo sequence vs. a water-sensitive intermediate-weighted sequence for their reliability in the detection of BMLs and their sensitivity to estimate change over time is still in debate.Our hypothesis suggests that the IW-TSE sequence had a higher sensitivity to change in the assessment of BMLs.To this end, we compared the BML prevalence and their change over time using the IW-TSE and DESS MRI sequences from a subset of patients from the Osteoarthritis Initiative (OAI) cohort (public data sets).

PATIENTS AND METHODS
Data used for this study were obtained from the OAI database, which is publicly available at http://www.oai.ucsf.edu/.Specific image data sets consist of a sample of participants extracted from the initial "progression" OAI sub-cohort, which consists of individuals with symptomatic and radiologic knee OA at the beginning of the study.The Group B, 3.B.1 (Variable V03, IMAGESB = 3), was chosen for the experiment.From this cohort (n = 160), the left knees of (n = 148) patients were selected as they had both IW-TSE and DESS sets of MRI acquisitions at baseline and 24 months follow-up.Of note, the left knee was systematically analyzed whether or not it was the index knee for the OAI study.Four subjects were excluded due to poor quality of at least one image in the IW-TSE and DESS at baseline and at 24 months.Therefore, 144 patients were studied.All had symptomatic knee OA [22] and structural pathology (definite tibiofemoral osteophyte, grade 1 -3, from baseline radiographic image clinic reading) in one or both knees at enrolment.These 144 patients had mean age 61.3 ± 9.8 (SD), mean BMI 30.0 ± 4.3 (SD), and 51.4% were female.The Kellgren-Lawrence (KL) score, as on the OAI website, ranged from 0 -4.

BML Scoring
BMLs were assessed as previously described [4,24].When the BML was visualized in several MR slices, such lesion was then evaluated where it appeared the largest.The widest size of each lesion was taken along the bone articular surface and compared to the corresponding size of the bone for the region of interest.In the case of several lesions, the sum of each size was used.A scale from 0 to 3 was used, where 0 = absence, 1 ≤ 25%, 2 = 25% to 50%, and 3 ≥ 50%, based on the extent of regional involvement.The regions were chosen as for the WORMS scoring system described by Peterfy et al. [24] with minor modifications being that the subspinal section between the two tibial plateaus was not included, nor were the tibial plateaus subdivided.The subdivision of the tibial plateau is dependent on the position of the menisci which could vary over time, resulting in imprecision of measurement, hence lower accuracy of the reading.The anterior, central, and posterior femoral subregions and the tibial plateau were added and combined according to their medial aspect and summed up to yield the medial compartment.For the lateral compartment, as the posterior femoral subregions exhibited no BML for all patients, this subregion was discarded.The compartments were also combined to yield a global knee value.The maximum possible scores were 6 for the global tibia, 9 for the medial femur, 6 for the lateral femur, 12 for the medial compartment, 9 for the lateral compartment, and 15 for the global femur.The maximum global knee score was 21.
Intra-reader reliability was assessed from randomly selected patients and read at an interval of two to four weeks.Fifty-one patients were selected for the reliability phase of the study.We selected the first 25 subjects within the progression cohort in a complete random fashion from such dataset (with or without presence of BML at baseline).However, more than 75% of these had no BML, making this intra-reading phase of our study somewhat biased toward greater reproducibility.Then, 26 additional subjects were further selected from the cohort who had presence of BML at baseline with diverse scores to better assess reliability using a full spectrum of disease status.The reliability portion of the study was blinded to patient identification.All MRI acquisitions from the 144 patients from baseline and 24 months were scored by an expert reader who has over 10 years of experience in the field and was trained to assess BMLs from both IW-TSE and DESS MR images by a musculoskeletal radiologist with 25 years of training experience.BMLs were scored according to the different knee regions as described above.The score was also used to establish the presence or absence of a BML (0 = absence, >0 = presence) within a specific knee region.Change in a lesion over time was computed as the score difference between 24 month follow-up and baseline.A positive sign indicates progression of the lesion score.

Statistical Analyses
Intra-reader agreement statistical relevance was assessed using Cohen's weighted kappa, where <0 indicates no agreement and 0 -0.20 slight, 0.21 -0.40 fair, 0.41 -0.60 moderate, 0.61 -0.80 substantial, and 0.81 -1 almost perfect agreement [25].An intra-correlation coefficient for the greater value ranges such as the global knee was not performed since the data were not normally distributed and skewed toward small values.Differences in the BML prevalence (presence or absence) in IW-TSE vs. DESS sequences were assessed by Fisher's exact test, a test similar to the Chi square test for categorical data, and the prevalence of change by the Chi square test.BML scores at baseline and 24 months, as well as the change at 24 months between the two MR sequences, were assessed using the Wilcoxon signed rank test, a non-parametric test.Prevalence of BMLs according to the KL score at baseline was assessed using a Chi square test for such contingency for both sequences.The sensitivity to change was expressed as standard response mean (SRM) [26], the advantage of which lies in its independence of sample sizes and the direct comparability of the values obtained through different tools.All tests were two-sided, and a p ≤ 0.05 was considered statistically significant.All statistical analyses were done using SPSS Statistics software, version 19 (IBM Corporation, Somers, NY, USA).

Intra-Reader Reliability of BML Scores
Intra-reader reliability for both IW-TSE and DESS was substantial for the IW-TSE sequence for the global knee and values were 0.64, for the medial compartment 0.73, and for the lateral compartment 0.63; for the DESS sequence values of 0.67, 0.70, and 0.66 were obtained, respectively.The intra-reader reliability was also substantial to moderate for the regions/subregions of the knee, ranging for the IW-TSE from 0.74 to 0.53 and for the DESS from 0.78 to 0.58.

Prevalence of BMLs at Baseline
The prevalence of BMLs detected at baseline in the global knee was almost identical for the IW-TSE and DESS (80.6% vs. 79.2%)(Figure 1).Likewise, the same similarities were seen for the medial compartment (70.1% vs. 68.1%)and the lateral compartment (53.5% vs. 48.6%).The prevalence of BMLs was similar in all the other knee subregions, but a slightly higher prevalence was seen in the medial plateau with the DESS than with the IW-TSE: 34.7% and 30.6% respectively.However, there was no statistically significant difference between the DESS and IW-TSE as assessed by the Fisher's xact test.e

BML Scores
In assessing the same BMLs in the two different MR sequences acquired at exactly the same position at baseline and 24 months (Figure 2), DESS yielded images in which the BML appeared smaller than in the IW-TSE.This specific lesion at baseline, for instance, was assigned a score of 2 (25% -50% of the respective joint surface) using the DESS sequence and a score of 3 (>50%) using the IW-TSE.The 24 months follow-up revealed that the BML extent was clearly reduced in comparison to baseline in the IW-TSE.The appearance of cysts, which was clearer in the DESS sequence as reported but not specifically assessed during BML reading, demonstrated structural change in the subchondral bone.
The mean BML score at baseline (Table 1) was systematically lower for the DESS than the IW-TSE sequence for the global knee, the compartments, and the subregions, with all being statistically significant (Wilcoxon signed rank test).Interestingly, mean changes in BML score at 24 months were similar for most regions (Table 1), but statistically significant higher score differences were seen for the DESS in the medial compartment (p = 0.034) and medial femur (p = 0.015) (Wilcoxon signed rank test), with a trend toward a higher score in the IW-TSE in the lateral tibial plateau (p = 0.069).
The median value score changes were virtually identical for all knee subregions regardless of the MR sequence analyzed (Table 1).When assessing the median score change, a change of a full score of 1 was only seen for the global knee in both the IW-TSE and DESS.All knee regions/subregions yielded better SRM for BML scores in the DESS sequence vs. IW-TSE except for the lateral tibial plateau (Table 1; 0.374 DESS vs. 0.458 IW-TSE).Interestingly, as seen in Table 2, a significant trend toward greater prevalence of BML with a greater KL score at baseline was found.

Prevalence of Increase, Stability, or Decrease in Change in BMLs over Time
In addition, we also evaluated the prevalence of change in BML score over 24 months (increase, stable, or decrease) (Figure 3 and Table 3).For the global knee, femur and tibia as well as medial compartment, medial femur, and lateral tibial plateau, the BML prevalence  detected with the DESS images was less inclined to decrease over time compared to the IW-TSE.For the global knee, IW-TSE showed a higher number and percentage of decrease in the prevalence of BMLs than the DESS (p = 0.047, Chi square test).This was also seen for the other regions/subregions except for the lateral compartment, femur, and medial tibial plateau.The prevalence of increase in BML was slightly higher for the DESS in the medial compartment, femur and tibia and for the IW-TSE in the lateral tibia, the latter being the only subregion other than the global knee that revealed overall statistical significance (p = 0.031, Chi square test).

DISCUSSION
This head-to-head study in knee OA patients compared the reliability, prevalence, and sensitivity to change of a BML scoring system contrasting two different types of commonly used MRI sequences, the water-sensitive IW-TSE and the DESS.A recent study has reported such a comparison; however, this was done only cross-sectionally and change in BML size over time, which is a most relevant question, was not addressed [5].Data from this study are in agreement with the previous one in which the IW-TSE sequence had a stronger signal toward BML and hence higher scores could be seen at any given time.
The two sequences were systematically used per protocol and design since the inception of this OA initiative.The DESS sequence reflects more structural changes within the knee structure while the IW-TSE reflects more of the inflammatory/edematous process.Indeed, the IW-TSE sequence yields a darker bone where "edema" shows lesser brightness than DESS.As such, discrimination between bone edema vs. a bone cyst is more difficult with IW-TSE than DESS.The complementary nature of these two sequences may eventually provide for a better understanding of these lesions from information obtained from clinical trials in which BML is a predictor or an outcome variable.However, the question remained, particularly in the context of clinical trials, as to the sensitivity to change of each of these sequences to assess changes in BMLs over time.The present study further revealed that neither of these two MRI sequences demonstrated statistically significant differences in the assessment of BML overall prevalence or, even more importantly, their change over time.Moreover, comparison between these sequences demonstrated that for BML assessment, the use of a steady state gradient sequence such as DESS, which has superior spatial resolution and signal to noise ratio results in better delineation of cystic and edema subchondral lesions as represented by lower standard deviation and hence higher SRMs (Table 1).
By using a water-sensitive sequence, the BMLs, which have increased water content, will be perceived as brighter and larger.However, it should also be expected that such fuzzy lesion brightness from an IW-TSE sequence would be subject to greater change if compared to the regional bone surface of the respective slice and therefore could be a less reliable marker to follow over time.The results of the present study indicate that compared to the DESS sequence, the greatest change seen with the IW-TSE was an increase in the lateral tibial plateau but, surprisingly, a decrease in most other regions/ subregions (Table 3), which in turn could impact the capability to detect a treatment effect on BMLs over time in the context of a DMOAD clinical trial.
Such data contrast with the belief based on findings from observational studies that water-sensitive sequences are superior to T1-weighted sequences for BML scoring over time [20,27].Therefore, it has been suggested that both MRI acquisitions should be performed to fully evaluate the cartilage and the bone and surrounding tissues.However, this approach has limitations as it would increase the acquisition time.Such long acquisition periods would be inconvenient for many patients enrolled in clinical trials, as it could increase the risk of patient movement leading to poor image quality and to the need for repeated MR acquisitions, which in turn would add up to the cost of longitudinal trials not to mention the impact on the dropout rate.The present findings thus have practical and economic implications for future OA clinical trials.A T1/T2 * -weighted steady state gradient echo sequence such as the DESS may suffice to effectively assess BML presence and score change over time if a sequence such as the IW-TSE is not available.
Changes in BML prevalence were seen in many knee regions/subregions but most were non-statistically significant probably due to a lack of statistical power among patients.Of note, no a priori calculation of power analysis was done as this is a post-hoc analysis of the OAI data.Nonetheless, these results revealed the importance of selecting the knee regions/subregions if one wishes to evaluate the impact of a DMOAD.Progression of structural changes is very heterogeneous throughout the knee joint and may in fact preclude the use of BMLs as an outright outcome to assess treatments.Accordingly, designing a study where BML presence or score progression would be used as the only primary outcome for a DMOAD trial would seem very hazardous.
The present study, as with any other, has limitations.First, in order to have both sequence acquisitions from baseline and 24 month follow-up for the same patients, a fairly small portion of the large OAI cohort had to be used for this comparative study.According to the power calculation, the number of patients needed to claim non-inferiority for the global knee at a level of 5% and the prevalence at 80% for the IW-TSE is 1584 patients.Presently, the number of patients that have an MRI at baseline and 24 months for the two sequences for the progression cohort is 1025 patients.Of note, at the time of this study inception only 160 were available.However, even using the entire cohort and assuming that all the MRI are acceptable for the BML determination, we would still not reach the number of patients needed.Such work may be repeated in the future with a larger patient cohort.
Secondly, the patient selection for the OAI had less stringent criteria compared to clinical trials.Previous work clearly demonstrated that the progression cohort from the OAI included younger patients with less knee pain at entry and, more importantly, had less cartilage volume loss, almost half (−1.9% loss of cartilage at 1 year in the medial compartment) [28] of what is seen in clinical trials [3,29].It is therefore logical to assume that the BMLs observed in these OAI patients may be less prevalent and of smaller size than those seen in patients in clinical trials [3,29].Moreover, in this study, we did not include the symptom information, and thus cannot extrapolate for BML presence/change over time and knee symptoms.Our data showing greater prevalence of BMLs in patients with a higher KL score at baseline clearly reflect this issue.However, this would not change the overall conclusion of this study that there is no supe-rior sensitivity of one sequence over the other in the assessment of BML prevalence and mean change over time in almost all regions/subregions and BMLs can be evaluated with the same reliability regardless of which of the two MR sequences (IW-TSE or DESS) is used.
Thirdly, when assessing the progression of the BMLs over time, we did not consider the impact of patient demographics or other structural knee tissue damage such as cartilage damage and meniscal extrusion that could potentially affect the presence of the BMLs and their change over time.Another limitation was that only one experienced reader assessed the MR images as another reader was simply not available at the time of our study.Finally, the large standard deviations in all the BML scores at baseline and longitudinal follow-up for both MR sequences are probably related to the heterogeneity among patients' disease progression and the relative imprecision of a semi-quantitative scoring.

CONCLUSION
This is the first report of a direct longitudinal comparison of the assessment of BML changes between IW-TSE and DESS MRI sequences in knee OA patients.This study did not suggest a superior sensitivity to change of one sequence over the other in the assessment of BML prevalence and their changes over time.It is therefore possible to speculate that with a T1/T2 * -weighted sequence that shows a fluid/tissue contrast superior to the commonly used FLASH or SPGR sequences could help to more reliably detect treatment response according to a more homogeneous signal progression.This finding is of obvious importance, particularly in the context of clinical DMOAD trials.

Figure 1 .
Figure 1.Histogram of the prevalence of BMLs at baseline for the global knee and regions/subregions.All knee region and subregion % comparisons between DESS and IW-TSE were not statistically significant as evaluated with Fisher's exact test.

Figure 2 .
Figure 2. Bone marrow lesions (BML) comparing two MRI sequences in which images were obtained at the exact same position in the medial central condyle of the left knee at baseline (A) and (B) and 24 months (C) and (D).(A) and (C) were acquired with a dual echo steady state sequence (DESS), and (B) and (D) with an intermediate weighted turbo spin echo sequence (IW-TSE).In the DESS ((A), white arrow) the extent of BML is smaller than in the IW-TSE ((B), grey arrow).In both (C) and (D) the BML area is reduced in comparison to the baseline, while the centers of the lesions increased in intensity reflecting subchondral cysts and a remnant of a BML in both sequences.The evolution of BMLs into cysts can also be oberved in the tibial plateau (arrow heads).

Figure 3 .
Figure 3. Histogram of the BML score prevalence of increase, absence, and decrease of change at 24 months.

Table 1 .
Comparison of bone marrow lesion scores at baseline and 24 months.
s OPEN ACCESS * Wilcoxon signed rank test, ** SRM: Standardized Response Mean.

Table 2 .
Correlation between the Kellgren-Lawrence radiological score at baseline and the prevalence of BML for the global knee.
*Chi square of 35.0, p < 0.001 for the DESS sequence and Chi square of 2.2, p = 0.002 for the IW-TSE sequence.2

Table 3 .
Prevalence of bone marrow lesion score change from baseline to 24 months.