Validation of the Daily Diary of Fatigue Symptoms — Fibromyalgia ( DFS-Fibro )

Introduction: Fatigue is an important fibromyalgia (FM) symptom, but existing measures of fatigue are unlikely to meet regulatory standards for clinical trial use. We describe the development and validation of the Daily Diary of Fatigue Symptoms—Fibromyalgia (DFS-Fibro), a 24-hour recall, patient-reported outcome (PRO) measure of fatigue in FM that is administered electronically (ePRO). Methods: There were 3 phases of work: 1) item generation based on concept elicitation interviews with FM patients, with clinical relevance confirmed by expert clinician review; 2) pilot testing/cognitive debriefing interviews with FM patients; and 3) psychometric validation using data from a methodology study with 145 FM patients. The measure was finalised based on both qualitative and quantitative findings. Results: Twenty-three items were generated in phase 1, some minor revisions were made following the pilot testing and cognitive debriefing (phase 2) but none were deleted. All patients found the measure easy to understand and use. Item reduction was conducted taking into account both the initial psychometric data and the earlier qualitative research, resulting in a final 5 item measure of the “symptom” of fatigue. The 5-item DFS-Fibro had very high internal consistency (alpha = 0.99), and strong test-re-test reliability (r > 0.84), convergent validity and known-groups validity. Conclusion: The DFS-Fibro has strong psychometric properties and strong face and content validity for the measurement of fatigue in FM.

In addition to pain, qualitative research has shown that a majority of FM patients regard fatigue as one of the most bothersome symptoms [11,14].In a Delphi study, fatigue was rated as the second most important domain to measure (after pain) by 23 expert clinicians [15], and the third most important domain by 100 FM patients (after pain and "overall FM") [16].
Like pain, fatigue can only be measured by patient report.Recent literature and current regulatory standards for the development of Patient Reported Outcome (PRO) measures (and specifically those intended for use as endpoints in clinical trials to support label claims) emphasize the importance of including qualitative patient research as a central part of the PRO development [17][18][19].Moreover, regulators particularly stress the importance of including input from patients who are in the target population of interest.This helps ensure that the questionnaire items using natural language used by patients, are easy to understand and answer, and assess concepts identified as relevant and important by patients.Although generic fatigue measures such as the Multidimensional Fatigue Inventory (MFI) and the Multidimensional Assessment of Fatigue (MAF) have been used in FM clinical trials [11,[20][21][22][23][24], the development of these measures did not include qualitative research with FM patients.Therefore, exis-ting instruments of FM fatigue may not be satisfactory for regulatory and labelling purposes.Furthermore, these existing measures have recall periods of 7 days or longer [14].Regulatory guidance for the development of patient report measures recommends that for variable symptoms, such as fatigue, patients may not be able to reliably recall over more than 24 hours [19].Thus, there was a need to develop a new, daily self-completed FM fatigue measure.

Phase 1-Item Generation
As recommended by current guidelines [19], a pool of items was generated based upon qualitative data from concept elicitation (CE) interviews with 40 FM patients from the US, Germany and France (reported in detail elsewhere) [14].In these interviews the patients consistently reported experiencing tiredness/fatigue as: an overwhelmming feeling of tiredness (n = 17, 42.5%), not relieved by resting or sleeping (n = 14, 35%), not proportional to effort exerted (i.e. they easily become tired) (n = 25, 62.5%), associated with a heavy feeling in their body (n = 16, 40%) or a weak feeling in their muscles (n = 9, 22.5%), that made it difficult for them to motivate themselves to do things (n = 23, 57.5%), affected things they want to do (n = 27, 67.5%), or made tasks take longer to do (n = 15, 37.5%), and made it difficult to concentrate (n = 21, 52.5%), think clearly (n = 12, 30%) or remember things (n = 9, 22.5%).
Item content, based on this patient feedback, was finalised with input from PRO experts, expert FM clinicians and interviewers from all three countries.The qualitative findings suggested that the measure should be completed daily with a 24-hour recall due to the variable nature of fatigue and due to patients reporting that they had difficulty recalling their fatigue accurately over more than 24 hours.It was also developed to be completed electronically, via a hand-held Personal Digital Assistant (PDA), to reduce respondent burden, allow time and date-stamping of data, reduce secondary data entry errors, and result in more accurate and complete data [25,26].

Phase 2-Cognitive Debriefing (CD)
The initial pool of items was pilot tested with FM patients in electronic format.The aim was to assess whether the proposed items were relevant, acceptable and understandable to FM patients, and to identify any changes that might be recommended before implementation in a clinical study.Patients were trained to use the device (visit 1), then completed the diary every evening for 3 -7 days (referred to as the "pilot test") followed by a 90-minute CD interview (visit 2) no more than 10 days after visit 1.Thus, patients gained "real-life" experience of completing the measure prior to debriefing.
During the CD interview patients performed a 'thinkaloud' where they spoke their thoughts aloud as they read and responded to each PRO question, followed by detailed debriefing questions about each item, response scale and the recall period-what patients understood them to mean, their relevance and suggestions for rewording [18,19,27].
FM patients (n = 20) were recruited through general practitioners, internal medicine specialists and pain specialists in the US.US-English-speaking men and women of any race and aged ≥18 years were invited to participate if they met the ACR diagnostic criteria for FM [2].Patients were excluded if they had significant physical or psychiatric co-morbidities, other severe pain, autoimmune or rheumatic disorder, other non-focal rheumatic disease, active infection, or an untreated endocrine disorder.Ethical approval of the study protocol, documents and procedures was granted by Copernicus, a centralized Independent Review Board, and written informed consent was obtained from all participants prior to entry into the study.
All interviews were audio-taped.Qualitative analysis of verbatim transcripts was performed using methods based on Grounded Theory [28][29][30] and Atlas.tisoftware [31].Based on patient feedback and discussions between the researchers and expert clinicians, revisions were made to the items, instructions and response options, resulting in the draft version of the DFS-Fibro.

Phase 3-Psychometric Validation
The draft DFS-Fibro which emerged from phase 2 was included in a cross-sectional methodology study with FM patients to provide psychometric data to inform item reduction and explore the psychometric properties of the final tool.Physicians completed a case report form (CRF) confirming patients met inclusion/exclusion criteria (identical to those used in phase 2) and providing information about patient clinical characteristics.Patients attended a clinic visit where they were trained to use the PDA.They then completed the draft DFS-Fibro each evening for 2 weeks until a follow-up clinic visit.Other measures were collected to validate the draft DFS-Fibro against, including a pain NRS and a sleep interference rating scale.These were also completed daily on the PDA, together with several pen/paper PROs completed at the clinic visits.Table 1 lists each additional measure, along with references for full instrument descriptions and a summary of the analyses the measures were used in.
A total of 145 FM patients were recruited by FM expert physicians at rheumatology or pain clinics in the US (150 patients were targeted, fitting with the recommendation of >5 subjects per item for factor analysis and sufficient for other psychometric methods [41]).All provided written informed consent prior to admission to the study.Patients were not screened for fatigue or pain levels prior to inclusion in the study.The distribution of responses for the draft DFS-Fibro items was summarised using descriptive statistics.Administering the draft DFS-Fibro by means of a PDA device prevented patients from skipping an item; therefore, evaluating missing data at an item level was not appropriate.Patterns in incomplete diary entries or missing days were assessed.Weekly mean scores were calculated if ≥4 assessments were available within the 7-day period.If fewer than 4 assessments were available, data for that week were treated as missing.
Item-reduction analyses involved inter-item correlation analysis and factor analysis conducted on week 2 mean scores (week 2 was selected to ensure that the reference week was the same for all patients since all had a follow up clinic visit at this time point and were administered the other measures).The previous qualitative findings and expert clinical opinions were taken into account for item deletion/retention decisions.
Internal consistency and test-retest reliability were assessed using Cronbach's alpha and Intra-Class Correlations (ICC), respectively.Coefficients >0.70 were required for both properties.Construct validity (convergent and divergent) was explored through correlation analyses between draft DFS-Fibro item scores and other PROs included in the study (Table 1).Scales measuring similar concepts were expected to correlate more highly than scales measuring dissimilar concepts, with the expected correlations to demonstrate convergent validity being moderate to strong, and divergent validity being mild to moderate (not low, because some correlation was expected among all measures given the nature of symptoms within FM).Demonstrating that the DFS-Fibro has stronger correlations with other measures of fatigue than with measures of sleep and mood was expected to confirm that these are distinct, albeit related, concepts.Discriminative/known-groups validity was explored by evaluating differences between groups known to vary according to the level of fatigue and FM severity (see Table 1 for relevant PRO measures).
Based on initial psychometric findings and the previously conducted qualitative research, recommendations were made for item deletion or retention.The psychometric properties of the final DFS-Fibro (version 1.0) were then evaluated against the same criteria as given above.

Phase 1-Item Generation
An initial pool of 23 items was developed from CE patient data and input from expert FM clinicians, and PRO experts.Items were grouped into conceptually consistent domains which it was hypothesized could be combined to measure the higher level concepts of severity and im-pact of fatigue (see Figure 1).Fatigue severity included four domains: overall fatigue (2 items); characterising fatigue (7 items); physical body fatigue (3 items); and motivation (3 items).Impact of fatigue included two domains: daily activity limitations due to fatigue (4 items); and cognitive limitations due to fatigue (4 items).All items were developed with a 0 -10 Numerical Rating Scale (NRS) response scale and a 24-hour recall period stated as "today".The items were developed to be completed as an electronic (ePRO) daily diary on hand-held PDA.
Qualitative findings from CD interviews (phase 2) Almost all patients found the 23 items easy to understand, interpret and complete.Patients were consistently positive about the electronic mode of administration.Patients did not identify any concepts as missing from the measure.In light of these findings, no items were added or deleted.A few minor changes to wording and bolding of words were made based on patient feedback.The most notable revision was deleting the definition of fatigue ("by fatigue we mean tiredness that makes it difficult to do things") from the end of question 1 ("How severe was your fatigue today?").During CD, this definition seemed to lead patients to focus on their ability to do things rather than on the severity of their fatigue; thus to help patients focus on the latter we removed this definition.
Four items in the "fatigue severity domain" were noted as potentially weak because they were interpreted slightly differently to how they were intended or were considered very similar to another item.In addition, some patients found it difficult to think about the impact of fatigue separately from pain when answering the "impact of fatigue" items.These items were considered candidates for deletion following psychometric analysis.Standard practice for instrument development is to start with a large pool of items and reduce this number based on the results of content and psychometric validation [42].Moreover, when the measure will be used as a daily diary, minimising patient burden in terms of the number of items is crucial.Thus the intention was to psychometrically test the draft DFS-Fibro and delete items based on psychometric analysis as well as CE and CD patient data.
All 23 items were therefore retained in the draft DFS-Fibro which was taken forward for psychometric testing.An item-scale structure consisting of six sub-domains was hypothesized for the draft DFS-Fibro: the first four sub-domains ("overall fatigue" [2 items], "characterising fatigue" [7 items], "physical body fatigue" [3 items] and "motivation" [3 items]) contributed to a fatigue severity domain score and the latter two sub-domains ("daily activity limitations due to fatigue" [4 items] and "cognitive limitation due to fatigue" [4 items]) contributed to an impact of fatigue domain score.For all items, except item 5 ("how much energy did you have today?"),higher scores indicate greater severity or impact.Item 5 was reverse scored.

Psychometric Analysis
There was minimal missing data and no floor or ceiling effects (i.e.<50% of patients chose either 0 or 10) for any item.The distribution of responses was acceptable for all of the items, so none were considered for deletion on this basis.Factor analysis of week 2 mean scores suggested one factor would explain the majority of variance in the measure (91%).All items had very high factor loadings (0.80 -0.98), thus strongly suggesting unidimensionality, except item 5 (0.31)-energy item.
In total, 18 items were deleted based on the qualitative and psychometric findings-the reasons for deleting each item are detailed in Figure 1 and summarised below.Inter-item correlations were moderate to high for all items except item 5, with many correlations above 0.80, thus providing strong evidence that some items could be considered redundant.The CD findings suggested that measuring only the core "symptom" of fatigue (rather than the impact of fatigue) may be the best approach to ensure that patients answer items thinking only about their fatigue (and not their pain); when answering impact items, many patients indicated that it was difficult for them to separate limitations due to fatigue from limitations due to pain.All items asking about the impact of fatigue on daily activities or cognition were therefore removed.
In addition to the impact items, the "motivation" subdomain items and item 23 were also deleted on the basis of the statistical results and because some patients during CD interpreted them as measuring impact on daily activities items rather than severity of fatigue.Item 5 was removed due to weak statistical results and patients interpreted this concept inconsistently during CD.The two weakest items demonstrating redundancy were also removed (items 12 and 20).Finally, three items were deleted because patients reported the item to relate to a different concept than intended (items 2, 8 and 10) during CD.
The above item deletions resulted in a final 5-item version of the DFS-Fibro (v1.0).Items retained were: 1 (how severe fatigue), 3 (worn out), 4 (easily get tired), 15 (exhausted) and 18 (tired).Thus, all of the items retained ask about the "symptom" of fatigue.Item 1 was retained as a useful overall item, although it was recognised that translating the word "fatigue" may be potentially problematic [14].

Psychometric Validation of DFS-Fibro (v1.0)
Factor analysis of the final 5 items confirmed a uni-dimensional structure supporting the scoring of all 5 items in a single scale.Unrotated factor analysis of week 2 mean scores for the 5 items showed that a one-factor solution was best, accounted for all of the common variance and was the only factor with an eigenvalue above 1 (4.74).All other factors had an eigenvalue of 0.02 or less.All five items had very high factor loadings (between 0.97 and 0.99) suggesting that they were highly related.
Pearson's correlations among the 5 items were high, ranging from 0.92 to 0.97 (Table 3).It could be suggested that the high level of inter-item correlations indicates a single, global item is sufficient, statistically, to measure FM fatigue.However, current guidelines for instrument development emphasise the importance of content validity and therefore to fully evaluate all important fatigue concepts, as identified by patients themselves, this must be assessed by the measure using the patient language.Moreover, previous patient qualitative work [14] showed that patients did not consistently use one single word that would represent the concept of "fatigue" therefore developing wording for a single item that would resonate with all patients would be extremely difficult.
The average response to all of the 5 items (all answered using a 0 -10 NRS) was taken as the daily score.Mean total scores over a week were then calculated.Cronbach's alpha for the mean total score at week 2 provided evidence of extremely high internal consistency for the DFS-Fibro (v1.0), with an alpha coefficient of 0.99.Calculation of an adjusted Cronbach's alpha for the total score by excluding each item by turn showed that removal of any of the five items had minimal impact on the alpha coefficient (Table 3).This indicated a very high level of internal consistency.
The DFS-Fibro (v1.0) showed strong test-retest reliability; ICCs between the week 1 and week 2 mean total scores for stable patients (identified as those who reported no change using the PIC or the PGIC) were 0.85 and 0.84, respectively, for the PIC and PGIC.ICCs for the individual items were all also >0.80 (Table 3).
The correlations between the week 2 mean DFS-Fibro (v1.0) total scores and the other PRO measures completed are shown in Table 4.The predicted level of correlation for convergent validity against all of the selected measures was met.Moderate to strong correlations (r = −0.49-0.89) were observed with the following measures: FIQ tiredness item; MAF severity, distress and interference with activities of daily living items and modified GFI, MFI general fatigue, physical fatigue, mental fatigue, reduced activity, and reduced motivation items; and the SF-36 vitality scale.
The predicted levels of correlation for divergent validity against all of the pre-selected measures were also met, with the exception of the daily Sleep Interference Rating Scale (r = 0.81) (Table 4).This measure asks about how much pain has interfered with sleep and the relationships observed with this scale are more consistent with those observed against measures of pain rather than other measures of sleep.Weak to moderate correlations (r = 0.37 -0.57) were observed with the following measures: HADS anxiety and depression scales; MOS sleep somnolence scale; and the ESS.
The known-groups validity analyses provide evidence the DFS-Fibro is discriminative.Overall, the summated DFS-Fibro total score discriminated well between all the differing groups at the p < 0.001 level (Table 5).

Discussion
The DFS-Fibro (v1.0) has been developed to provide a specific measure of fatigue for patients with FM to be used in clinical trials as an outcome measure to support product label claims.It is self-completed, with five items, for completion daily using a hand-held electronic device.The measure asks about the "symptom" of fatigue using simple, patient-friendly language.
I n contrast to existing fatigue measures, the DFS-Fibro  (v1.0) has been developed based on qualitative interviews with FM patients, using the regulatory standards to guide its development thus allowing for the DFS-Fibro (v1.0) to be used in clinical trials to support labelling claims [19].Indeed, because it has been generated from FM patient interviews and pilot tested, patients reported that the DFS-Fibro (v1.0) is easy to understand and answer and includes the important fatigue concepts as identified by patients themselves; thus ensuring the measure has content validity for an FM-specific population.
The majority of existing measures of fatigue use a 7day recall period whereas the DFS-Fibro (v1.0) has a 24hour recall period.This is because evidence that when concepts such as pain or fatigue are measured with longer recall periods (such as 7 days or 4 weeks), ratings are typically inflated compared with shorter recall periods [43,44].Furthermore, there is evidence that 24 hour recall periods correlate more closely with momentary ratings than 7 day or 4-week recall periods do [44].
The rationale for developing the DFS-Fibro (v1.0) was to include it in upcoming FM clinical trials as an outcome measure for assessing the efficacy of new FM treatments.As an assessment of a symptom that has been shown to fluctuate and vary on a daily basis [14], using a 24-hour recall period was deemed preferable to a 7-day recall to ensure we capture fluctuations in patients' FM fatigue and most pertinently, to ensure accuracy in their reporting, and to ensure we do not inflate patients' fatigue  [43,44].
Sample clinical and demographic characteristics in the three study phases were comparable.Initial CE interviews, conducted with 40 FM patients in three countries [14], ensured that the items developed had strong content validity, were not specific to any particular culture and could be easily translated.Input from expert clinicians at the Item Generation meeting provided support for the clinical relevance of items.As a result, 23 items were developed that reflect patients' natural language, and that were clinically relevant.Subsequent pilot testing and CD provided additional support for the content validity of the draft version of the measure (referred to as the draft DFS-Fibro) prior to inclusion in the methodology study.
The psychometric analyses of the draft DFS-Fibro indicated that all items, except item 5 (energy), were closely related.The weak relationship between item 5 and all other items could be due to the fact that this item was the only positive worded item.However, the statistical data as well as the qualitative findings suggest that the concept of "energy" is not a simple, uni-dimensional concept but one that may be interpreted differently by different patients, thus accounting for its lack of correlation with the other items and subsequent removal from the diary.
It is frequently suggested in the literature that fatigue is multi-dimensional and includes physical, mental and sometimes emotional components [45][46][47][48][49]. Although, the qualitative research reported here supported this multidimensional view, the correlation and factor analysis data suggested that these components are so closely related that a measure focusing on the core "symptom" of fatigue is most appropriate.As shown in our research and other literature, fibromyalgia is a complex condition with multiple symptoms that all impact upon a person's life in many ways [50].However our research with FM patients shows that they have difficulty separating the impact of fatigue from the impact of other symptoms, such as pain; therefore asking patients about the impact of fatigue on cognition or daily activities is inappropriate.These elements are therefore more appropriately captured by evaluating the impact of FM as a whole using comprehensive measures of functioning that are specific to these domains.The impact items were therefore removed from the fatigue diary.Additional items were removed to reduce item redundancy, this is particularly important for a daily diary in which minimising respondent burden is crucial.This resulted in a brief, 5 item tool that retained items with strong psychometric properties that focuses on the symptom of fatigue using commonly used and understood patient language.The fatigue literature often emphasizes the fact that fatigue is multi-dimensional, including physical and cognitive/mental and sometimes emotional components [51,52].While in our qualitative re-search we found that all of these components were reported, the fact that all items correlated so highly demonstrates that important information is not lost by focussing solely on the "symptom" of fatigue.
Psychometric testing of the 5-item DFS-Fibro (v1.0) demonstrated that it has strong measurement properties.All five items were highly correlated and the very high Cronbach's alpha confirmed that the measure has very high internal consistency.Adjusted Cronbach's alpha data suggested that further items could be deleted, but we considered that retaining all five items was beneficial for reasons of face validity.The final DFS-Fibro (v1.0) also had strong test-retest reliability, convergent validity and known-groups validity.
Several limitations of the psychometric validation study need to be considered, and a number of additional activities should be conducted to provide further support for the use of the new measure.The validation study evaluated cross-sectional psychometric properties and test-retest reliability only.Priority should now be given to evaluating the responsive of the DFS-Fibro (v1.0) to changes over time in a FM intervention study in which patients' fatigue would be expected to change.Such a study could also be used to help identify minimally important changes in the measure over time.
In addition, although the qualitative work was conducted in several countries to address cross-cultural issues, the CD and the validation study were performed in the United States only.The translatability and linguistic validity of the DFS-Fibro (v1.0) has thus not been explored.Moreover, the psychometric validation results detailed here are specific to the electronic version of the DFS-Fibro.If a paper version of the measure was to be used in clinical practice or research, additional equivalence testing of that mode of administration would be necessary [26].
The DFS-Fibro (v1.0) focussed purely on measuring the "symptom" of fatigue.Given the complex nature of FM, it is recommended that fatigue is evaluated as part of an overall measurement strategy including measurement of other FM symptoms and the impact of all symptoms (or overall FM) on a patient's life.Many patients with FM experience sleep problems [16,53], and although our qualitative work clearly suggested that fatigue and sleep were distinct [14], additional qualitative work exploring the link between sleep problems and fatigue would be of value.
The components of fatigue identified in our research are consistent with those described for other chronic conditions in which fatigue is a common and distressing symptom (e.g.rheumatoid arthritis (RA) [45] and cancer [54,55]).Patients with RA also describe their fatigue as being extreme, different from normal tiredness, severe weariness and overwhelming.Studies in patients with cancer or primary Sjögren's syndrome have shown that brief questionnaires concentrating on the "symptom" of fatigue may be sufficient for assessing fatigue in these diseases [56,57].These observations suggest that the DFS-Fibro (v1.0) may have value as a measure of fatigue in other disease areas.However, this would have to be supported by qualitative research and psychometric validation of the measure in those patient populations.
In summary, a rigorous development process has resulted in a brief (five-item), patient-reported instrument for the assessment of fatigue in FM patients.The DFS-Fibro (v1.0) has strong psychometric properties and face and content validity, and would have minimal burden for patients to complete during an intervention study.The DFS-Fibro (v1.0) is a valuable tool for assessing an important symptom in FM which is often under-recognised in terms of impact and importance to patients.Moreover, it is appropriate for use in clinical trials as an outcome measure to support label claims.
The DFS-Fibro is the copyright of Pfizer, a paper version is available via the Pfizer PRO website (www.pfizerpatientreportedoutcomes.com).

Competing Interests
This study was funded by Pfizer, Ltd.Authors Louise Humphrey and Rob Arbuckle, employees of Mapi Values, served as paid consultants to Pfizer during the conduct of this study and the development of this manuscript, all other authors were employees of Pfizer at the time the study was conducted.

Authors' Contributions
All authors were involved in the design of the study, designing the analysis plan, reviewed the analysis findings, participated in item reduction and reviewed and gave input into the manuscript.In addition, CB, LH and RA performed the qualitative research and IH performed the psychometric validation analyses.

Figure 1 .
Figure 1.Development of the final daily diary of fatigue symptoms-fibromyalgia (DFS-Fibro) (v1.0) following content validity testing and psychometric analysis.

Table 1 . Other patient reported outcome instruments included in the psychometric validation study and their relevance to the psychometric evaluation of the daily diary of fatigue symptoms-fibromyalgia. Instrument Reason for Inclusion Statistical Analyses
Medical Outcomes Study-Sleep Scale (MOS-SS) [39]  Divergent validity (sleep somnolence scale) Pearson's correlation Hospital Anxiety and Depression Scale (HADS) [40]  Divergent validity (anxiety and depression scales) Pearson's correlation Daily Sleep Interference Rating Scale (SIRS)  Divergent validity Pearson's correlation Fibromyalgia Severity Scale [1]  Defining groups for known-groups validity evaluation ANOVA

Table 2 . Demographic and clinical characteristics of patients included in the cognitive debriefing and psychometric validation studies.
a Data missing for 7 patients.

Table 3 . Psychometric validation of DFS-Fibro version 1.0.
a Adjusted indicates that the alpha coefficient was computed with the indicated item removed.The alpha coefficient for the week 2 mean overall DFS-Fibro version 1.0 score was 0.99; b Intra-class correlation coefficient; PGIC: patient global impression of change; PIC: patient impression of change.