Demographic variations in discrepancies between objective and subjective measures of physical activity

Demographic effects (sex and parenthood status) on the level of association between self-reported and accelerometer assessed physical activity were examined among a large diverse sample of adults. Participants (N = 1249, aged 20 65 years) wore accelerometers (Actical) for 7 days and completed an interviewer-administered physical activity recall questionnaire (IPAQLF) for the same period. Mean daily minutes of moderate physical activity (MPA) and moderate to vigorous physical activity (MVPA) were used in analyses. Linearity between methods was explored by regressing mean minutes of activity and Pearson’s correlations were performed. A weak association between IPAQ-LF and Actical minutes of MPA and MVPA per day was shown for the whole sample (rs = 0.216 0.260). The magnitude of association varied between males (rs = 0.265 0.366) and females (rs = 0.124 0.167), although no obvious variations in associations were evident for parenting status. The IPAQ-LF produced substantially greater variations in estimates of physical activity than that recorded by the Actical accelerometer and large discrepancies between methods were observed at an individual level. Self-report tools provide a poor proxy of overall human movement, particularly among females. Inferences made at an individual level from self-reported data, such as intervention efficacy or health outcomes, may have substantial error.


INTRODUCTION
Physical activity is an essential behavioral element in maintaining good health, preventing disease, and pro-longing longevity [1].The epidemiology of physical activity considers the association physical activity and inactivity have with chronic diseases, and the mechanisms to prevent and control these diseases.Accurate monitoring of physical activity engagement in free-living populations is central to correctly determining the direction and magnitude of these associations and mechanisms.Conventionally emphasis has been placed on measuring moderate and vigorous physical activities performed in leisure domains, although more recently non-leisure domains (e.g., occupational, transport, household) have featured.In response to growing evidence for the poor health outcomes associated with sedentary behaviors [2,3] and the positive health effects of low-level activity [4,5], a call has been made to incorporate these behaviors in measures of physical activity so that they may be tracked and health outcomes determined [6,7].
In the absence of an agreed-upon criterion for quantifying physical activity many types of measures are applied [8].Self-report tools (e.g., questionnaires, diaries) are the most widely-used measure of physical activity at a population level.Inherently, these tools rely on participants' ability to accurately recall, quantify, and categorize their physical activity behaviors according to the framework of the self-report tool.Conversely, motion sensors (e.g., accelerometers, some pedometers) can be used to provide an objective assessment of the accumulation of activity movement (commonly lower body movements) throughout a period of time.Accelerometers (e.g., Actigraph, Actical, Caltrac) use piezoelectricity to register acceleration, recording detailed temporal data across the spectrum of activity intensity, including sedentary behaviors and low-level activity.
Many epidemiological studies using self-report measures have shown women to be less active than men [9].Moreover, some sub-groups of women, such as women with young children (WYC), are thought to be even less active [10].In part, this may be due differences in the way physical activity is performed and thus measured.
Firstly, planned activities and those performed at vigorous-intensity are most memorable and are therefore more likely to be accurately recalled.Whereas, WYC are known to spend longer durations in total work (paid and unpaid) each day than men with young children (MYC) and women without young children (WNYC), potentially leaving less time for planned leisure [11].This may also be the case for MYC compared to those without children (MNYC).Additionally, activities of WYC are often sporadic and performed simultaneously with other tasks, for example, carrying a child whilst vacuuming.These types of activities are regularly interrupted with needs of young children that need tending, and difficult to categorize and quantify through self-report [12].Yet, accelerometers have the ability to record movement regardless of duration, intensity or purpose.It is possible therefore, that systematic differences in the validity of self-report tools may be present across major demographics.
A variety of self-report tools exist, and many studies have examined the validity of self-report tools using accelerometry as the criterion, or objective, measure of physical activity.Previously, discrepancies between objective and subjective measures of physical activity have been shown within adult populations [13,14]; however it is unknown whether these discrepancies vary by certain demographic variables.If this were the case, it would have significant bearing on the selection of measurement tools dependent upon the population being studied.The International Physical Activity Questionnaire (IPAQ) was developed following an international collaboration to develop a standardized self-report measure of physical activity suitable for population-wide assessments of physical activity [15].The IPAQ long form (IPAQ-LF) requires recall of physical activity engagement at moderate and vigorous intensities for occupational, transport, household, and leisure domains for a 7-day period (either usual or previous).
This study examines the association between activity derived from the IPAQ-LF and concurrent accelerometer derived activity, and, the potential methodological impact on mis-measurement imposed by differences in sex and adults' parenting status.The aims of this study were therefore to: 1) examine the level of association between self-reported and accelerometer assessed physical activity engagement in a large sample of adults, and 2) determine differences in the level of association between measures among men and women with and without young children (aged 0 -4 years).

METHODS
Participants were part of the Understanding the Relationship between Activity and Neighborhoods (URBAN) Study, a multi-centered, stratified, cross-sectional study of associations between physical activity, health, and the built environment in adults and children residing in New Zealand [16].Objective and self-reported physical activity engagement, neighborhood perceptions, demographics, and body size measures were collected, along with built environment variables.The study contributes to a larger, international collaborative project where similar procedures are utilized across eight countries (www.ipenproject.org).

Participants
Adults aged 20 to 65 years were recruited randomly from 48 neighborhoods (stratified by high/low walkability, high/low Māori population) across four New Zealand cities during 2008-2010.Trained interviews followed pre-determined walk paths for each neighborhood and approached every nth house, according to the neighborhood household sampling rate.One adult from each household was invited to participate.Further details of the neighborhood selection, recruitment methods, and sample power calculations have been described elsewhere [16].

Data Collection
Trained interviewers gained written informed consent and delivered accelerometers and travel/compliance logs during the first home visit.Eight days later, the interviewer visited the home a second time to collect the accelerometer and travel/compliance log, measure participants' height, weight, waist, and hip circumferences, and to administer the study questionnaire.

Measures
A range of measures were utilized in the URBAN study.Those relevant to the current study are outlined below:

Objectively Assessed Physical Activity
Hip-mounted Actical accelerometers (Mini-Mitter, Sunriver, OR) were used to objectively measure participants' physical activity.The units have been shown to be a reliable and valid measure of physical activity in adult populations [17,18].Accelerometers were prepared to record physical activity and step counts in 30-second epochs.Participants were instructed to wear the unit for all waking hours (excluding water-based activities) for seven consecutive days.Participants self-completed a compliance log of wear-time and activities the participant engaged in whilst not wearing units, for the duration of accelerometer data collection period.The information derived from the log was checked and matched 15 against accelerometer data.

Self-Reported Physical Activity
The International Physical Activity Questionnaire in long form (IPAQ-LF) was administered via interview at the second home visit to capture adults' self-reported physical activity for the previous seven days (the period when the accelerometer was worn).The IPAQ-LF assesses frequency (days), duration (minutes), and intensity (walking, moderate, vigorous) of physical activity engagement across four domains: occupation, transportation, household, and leisure.Moderate physical activity was defined as "those activities that take moderate physical effort and make you breathe somewhat harder than usual"; vigorous physical activity as "those activities that take hard physical effort and make you breathe much harder than normal" [19].Evidence for the reliability and validity of this tool has been provided for 744 adults across 12 countries [15].

Demographics
Participants completed a demographic survey that included: gender, age, ethnicity, marital status, household income, academic qualifications, occupation, dwelling type, and the number and ages of children living in the dwelling.

Self-Reported Physical Activity
According to the IPAQ scoring protocol (www.ipaq.ki.se/), minutes of physical activity engagement from the IPAQ-LF were summed across activity domains for each level of intensity (walking, moderate, and vigorous).Mean daily minutes of moderate (sum of walking and moderate, MPA), vigorous (VPA), and sum of MPA and VPA (MVPA) activity engagement were calculated to minimize the effect of missing days of accelerometer data.

Objectively Assessed Physical Activity
Accelerometer data were downloaded using Actical® version 2.04 (Mini-Mitter Co., Inc., Bend, OR, USA).Thresholds for MPA and VPA were generated by the Actical software and were based on MET-value based cutpoints.Data were prepared for analysis using SAS (version 9.1, SAS Institute Inc., Cary, NC, USA) and Microsoft Excel.Bouts of 60 or more consecutive minutes of zero counts were considered non-wear-time and extracted prior to analysis [20].Wear-time criteria for inclusion were defined as having five or more days of 10 or more hours of wear-time per day.Mean daily minutes of MPA, VPA, and MVPA activity were used in all ana-lyses to ensure comparability across the sample.Mean values for individuals were calculated using the number of days of accelerometer wear that met wear-time criteria.

Statistical Analyses
All analyses were undertaken using SPSS (version 18) and statistical significance was set at α = 0.05.Shapiro-Wilk's test of normality was conducted for both physical activity measures, and non-normal distributions were log transformed to achieve normality.Means and standard deviations were used to describe both methods of measurement for the whole sample and each demographic (WYC, women with young children [children aged 0 -4 years]; MYC, men with young children; WNYC, women with no young children; MNYC, men with no young children).
Commonly, correlation coefficients are used as a single score of validity between measures however it is appropriate to explore associations with a broader view.Firstly, a test of the differences between measurement means allows quantitative assessment of whether methods are significantly different from each other.Secondly, a scatter plot of the two measures with the line of identity provides visual assessment of linearity and systematic or random bias in the relationship between measures.The correlation coefficient can then be calculated as a summary of the overall scatter between measures, indicating the strength of the linear relationship [21].Therefore, paired t tests were used to compare means between methods, and linearity was explored by regressing mean minutes of activity at each intensity derived using the IPAQ-LF against mean minutes of Actical.Evaluation of linear relationships between Actical and IPAQ-LF using Pearson's correlation were performed.Results are presented for the whole sample and comparisons made between demographic groups (WYC, MYC, WNYC, MNYC).

Participants
A total of 2013 adults aged 20 to 65 years participated in the URBAN study between April 2008 and September 2010.Participants with missing demographic (n = 4) or IPAQ-LF (n = 5) data were excluded, as were participants who did not meet criteria for accelerometer wear-time (n = 731).Outliers in IPAQ-LF data were calculated using interquartile range (IQR) computation, where any value more than 3 IQR above the third quartile were considered a problematic outlier (n = 24).Therefore data from 1249 participants were included in these analyses (Table 1).

Moderate Physical Activity
Descriptive statistics, presented in Table 1, indicate that the IPAQ-LF reported significantly higher means of MPA engagement than the Actical (t = 2.104, p = 0.036).Additionally, standard deviations of the means were substantially greater for the IPAQ-LF indicating greater variance in self-reported activity levels.The scatter plot demonstrated a weak relationship between measures, as can be observed in Figure 1; whilst the regression line passes the line of identity (logny = lognx) near the means for both measures, the scatter indicates significant random bias.Further, evaluation of the linear relationship between methods indicated a weak association (r = 0.216, p = 0.000).

Openly accessible at
All demographic groups reported higher mean estimates of MPA by IPAQ-LF than measured by Actical, however these differences were only significant in MNYC (t = 2.680, p = 0.008).Weak associations between methods were found for both WYC and WNYC (r = 0.124, p = 0.164 and r = 0.130, p = 0.002 respectively), while for MYC and MNYC a linear (albeit weak-moderate) relationship was found between measures (r = 0.265, p = 0.015 and r = 0.331, p < 0.001, respectively).

Moderate-to-Vigorous Physical Activity
Actical derived MVPA increased marginally from MPA whereas IPAQ-LF MVPA values increased disproportionately indicating much greater vigorous activity via self-report compared with objective measurement (Table 1).As was observed with MPA, paired t test results revealed significant differences between method means (t = -3.385,p = 0.001) and weak association between methods (r = 0.260, p = 0.000).Scatter plots continued to show significant random bias in self-report (Figure 2).
All demographic groups self-reported more MVPA than recorded by accelerometer, and both male groups self-reported substantially greater VPA than either female group.Significant differences were observed between methods for MVPA means for MYC, and WNYC, but not for WYC or MNYC.Methods were moderately correlated for MVPA for both male groups (r = 0.337, p = 0.002 and r = 0.366, p < 0.001, respectively) yet both female groups showed similarly weak associations (r =

DISCUSSION
This study investigated the demographic effects of sex and parenthood status on associations between a selfreport tool (IPAQ-LF) and objective method (Actical accelerometry) for describing MPA and MVPA in adults.A weak linear relationship between IPAQ-LF and Actical minutes of MPA and MVPA per day was shown and the magnitude of association varied between men and women; no obvious variations in associations were evident for parenting status.
It is evident from the regression plots that the IPAQ-LF produces greater variation in estimates of physical activity variables than recorded by the Actical accelerometer, which is reflected by the weak associations found between measures across all demographic groups.Importantly, at a population level the method means were similar however substantial discrepancies between measures occurred at an individual level.This indicates that inferences made at an individual level, such as intervention efficacy or health outcomes may have substantial error; the utility of self-report tools for such a purpose is therefore questionable for all demographic groups, however this may be more so among women.Previous IPAQ-LF validation studies have reported moderate associations with accelerometry (r = 0.30 -0.33) [15,22] and doubly labeled water (r = 0.31) [23] for describing MVPA in adults.None of these studies reported demographic-specific associations, although similar associations were reported across 12 countries, including developing countries [15].In the validation study of a self-report tool developed for use among WYC incorporating a variety of domains, comparably weak associations of self-reported MVPA with acceler-ometer-derived MVPA were reported (r s = 0.13) [24].The same study reported improved associations when considering MVPA from planned and transport domains of activity only (r s = 0.28).
Moderate associations between self-reported and accelerometer-derived physical activity are typical [13][14][15][16][17][18][19][20][21][22][23][24][25].Because of their widespread availability, low cost, and ease of use, self-report tools continue to be employed despite their well-documented shortcomings [15].In particular, self-report tools are cognitively challenging; they require participants to recall, estimate, and classify physical activity engagement, usually over a 7 day period.A study that probed respondents for clarification of self-reported responses found 74% over-reported, 10% under-reported, and 16% reported total activity accurately [26].Social desirability bias may also lead to over-inflated estimates of physical activity behavior.Further, self-report tools are biased to certain patterns of activity; planned activities and those of vigorous-intensity are more accurately and reliably recalled than lowlevel intermittent behaviors [15,26].The latter methodological flaw potentially misses vast quantities of health-enhancing activity, especially in those who are unable or lack opportunity to participate in vigorousintensity activity.Efforts have been made to rectify this somewhat by attempting to capture time spent in occupation and domestic physical activities.Arguably however, physical activity performed in these domains is often low-level and intermittent, therefore difficult to recall accurately.A common feature of self-report tools is also to exclude activity bouts of <10 minutes.Whilst this may improve the reliability in recalling behavior it systematically excludes activities regularly promoted as health enhancing, such as using stairs instead of the elevator, parking further away and walking the extra distance, and many domestic and yard activities of moderate-intensity.Algorithms to extract minimum bouts of activity recorded by accelerometers (e.g., ≥10 minutes, allowing 1 -2 minutes interruption within each bout) have been promoted and utilized in some studies in order to provide comparability with many self-report tools [27][28][29].Whilst this method may produce greater convergence between measures, this may simply be case of biasing the objective measure to systematically miss more actual activity.Such methods were not used in this study because of its inherent limitations.Firstly, activity which most health professionals would regard as "healthrelated" is often not conducted in one continuous bout, even after allowing for a 1 -2 minute interruption, and may fluctuate between light-intensity and vigorous intensity, as is common place in many sports and recreational activities.Further, this approach may exclude significant contributions that short bouts of activity make to overall daily physical activity.
It is likely that patterns of activity may contribute to the sex effects observed in this study.Particularly, women (both with and without young children) may be more likely to perform low-level intermittent activity than engage in planned bouts of vigorous-intensity activity, potentially producing greater variability in reporting.It appears that sex has a greater effect on self-report accuracy than parent status; this may be due to the presence of confounding factors such as parental employment, presence of older children, socioeconomic status, and marital status.The IPAQ battery of questionnaires were developed in an attempt to provide a standardized self-report tool suitable for population estimates of the prevalence of physical activity so that comparisons between countries may be made [15].Whilst there is some merit in this purpose at a population level, the application of self-report tools to determine health outcomes of physical activity behavior is inappropriate.The present study has concurred with previous research demonstrating a weak-to-moderate association between self-reported and accelerometer derived behavior [13,14].Importantly however, this research further shows that the ability of self-report tools to capture overall activity is particularly weak among women, probably due to lower levels of planned moderate-to vigorous-intensity activity.With greater emphasis now being placed on measuring physical activity across the spectrum it is important that measurement tools are capable of doing so accurately.
The strengths of this study include its large heterogeneous sample with a spread of demographics providing adequate variations in physical activity for exploring associations across a full range of activity levels.It appears that the sample in this study were highly active, although this is an acknowledged outcome of the IPAQ-LF given the number of domains considered [15].A limitation of the study is that it was not methodologically designed for measurement tool validation, reflected by the high proportion of participants that were excluded due to inadequate accelerometer wear-time.This study demonstrates substantial discrepancies between selfreport tools (specifically the IPAQ-LF) and accelerometer derived physical activity.Findings indicate that selfreport tools provide a poor proxy of human movement, especially among women.Careful consideration must be given to the patterns of activity in the intended study population and the purpose of measurement.

Figure 1 .
Figure 1.Association between self-report and accelerometer derived MPA for the whole sample.Note: dashed line represents line of identity; solid line represents linear regression line.

Figure 2 .
Figure 2. Association between self-report and accelerometer derived MVPA for the whole sample.Note: dashed line represents line of identity; solid line represents linear regression line.0.161, p = 0.069 and r = 0.167, p < 0.001, respectively).

Table 1 .
Participant characteristics and descriptive statistics of IPAQ-LF and Actical measures.