Use of Asymmetric Models to Adjust the Vitamin Intake Distribution Data for Older People

One of the main interests in the nutrition field is to estimate the distribution of usual nutrient intake. Data from vitamin intake generally present high asymmetry mainly to the presence of outliers. This can occur due to the variability of the diet and, in this case, robust estimation to get the distribution of the data can be required. Then, the aim of paper is to propose an alternative approach for estimating usual intake through asymmetric distributions with random effects applied to data set 10 vitamins obtained from a dietetic survey for 368 older people from Botucatu city, São Paulo, Brazil. In general, these asymmetric distributions include parameters related to mean, median, dispersion measures and such parameters provide good estimates for the intake distribution. In order to make some comparisons, a model fitted by National Cancer Institute (NCI) method with only for amount of nutrient intake was established using Akaike Information Criteria (AIC). NCI method is based on a Box-Cox transformation coupled with normal distribution but in case of asymmetric data, this transformation can be not useful. It was observed that, in the presence of outliers, the asymmetric models provided a better fit than the NCI method in the major of the cases. Then, these models can be an alternative method to estimate the distribution of nutrient intake mainly because a transformation for the data is no necessary and all the information can be obtained directly from the parameters.


Introduction
One of the main interests of the researchers in the nutrition area is to estimate the distribution of usual nutrient intakes of a population group.
The statistical modelling to measure the distribution of usual nutrient intake of a population presents a challenge, since different individuals tend to have different dietary habits from each other (between-person variability), and the individual himself does not necessarily have a constant consumption (within-person variability) [1].
In the literature, it is common to use a Box & Cox transformation [2] and analyze consumption data under normal assumption. Methods such as Iowa State University (ISU) and Best Power (BP) follow this background [3] as they do not allow the inclusion of covariates. Such methodologies take in account the normality of the data before or after a transformation and generally differ on back transformation to be applied to get an original scale data, as well as the form of attenuating weight factor considered in such a context. Another model, called the National Cancer Institute (NCI) method [4], allows the inclusion of covariates and takes into account the probability of an episodic intake. However, for data with high asymmetry and/or presence of extreme outliers, the NCI-based modeling method doesn't seem to be very appropriate.
Normal distribution is one of the most used distributions in the analysis of continuous data due to its properties, especially in the context of linear models. However, the presence of outliers affects relevantly the inference based on the normal distribution, encouraging the development of robust procedures, which are defined as less sensitive to deviations from the assumptions which they are based on [5]. Besides, the use of asymmetric models can be an alternative form to model consumption data without the need of a transformation in order to get normal distribution [6].
Thus, the aim of this paper is to obtain an estimate for the usual consumption of vitamins for older people through the application of asymmetric models, and some of them may contemplate a robust estimation procedure (in case of strong presence of aberrant points). Furthermore, a comparison with the NCI method was made considering only the amount consumed [7].

Methodology
The analyzed data came from a representative sample of 368 individuals belonging to an epidemiological study that aimed to assess the adequacy of nutrient intake for older people from Botucatu city, São Paulo, Brazil.
The sample was collected in the year of 2011 through the application of the 24-hour recall (R24h). The study included elders who do not have cognitive deficits and agreed to participate into the research. In order to estimate the usual intake of the subject, more than one R24h is needed, due to the variability in consumption on different days of the week. Thus, up to three R24h were obtained from each subject on nonconsecutive days of the week, one of them being obtained necessarily on the weekend.
The data collected by the recalls were transformed into micronutrient consumption data using the NDSR program (Nutrition Data System for Research).
Nutrient intakes are different for men and women. Thus, the analyses were performed by gender. The data were also collected considering age, marital status, education and morbidities as well as anthropometry measures and information about the Activities of Daily Living (ADL) and the Instrumental Activities of Daily Living (IADL).
A descriptive analysis of ten vitamin consumptions were presented as: mean, standard deviation (SD), median and coefficient of variation (CV*) based on the median of the data distribution [8].
The data modeling was performed by means of a random intercept model, which characterizes different measurements on the same subject. The idea of the used model is that the between-person variability is absorbed by a random effect, and the within-person variability is absorbed by the nature of the chosen distribution for the response variable that is similar to the NCI method [7].
Asymmetric distributions were proposed for the response variable and it was used a normal distribution with zero mean and variance κ2 for the random effect. In order to select such asymmetric distributions, the fitdist and histdist functions from Gamlss (Generalized Additive Models for Location, Scale and Shape) routine at R software, v.3.0.1, were used [9]. The penalized maximum likelihood method was used to estimate the parameters of the asymmetric models. Furthermore, the estimate processes were made by RS and CG interactive algorithm [10]. The variance of the random effect was estimated by gamlss. mx using EM algorithm. The fitted asymmetric models were made using R software, v.3.0.1 [11].
The NCI method for amount-only model presents in SAS version 9.3 software implemented by macro MIXTRAN was used for comparison with the proposed models, since it has a similar structure with the asym-metric models in this study. The average usual intake consumption estimated by the NCI method using the macro DISTRIB, is also shown for the comparison purpose with the parameters estimated by asymmetric models.

Results
The sample of 368 older people presented mean age of 71.20 ± 7.11 years old for men and 71.79 ± 7.21 years old for women. Table 1 presents the description of demographic and morbidity variables of the elders. Table 2 presents the descriptive measures for the vitamin intakes of the older people according to gender. Table 2 shows that the standard deviations (SD) were too high for most vitamins indicating a high dispersion around the mean. Moreover, average and median values for most of the vitamins are not close, indicating an asymmetry in the data. Thus, the reported descriptive measures suggest a proposal for statistical modeling that includes an asymmetry and robust estimation.
Gama distribution, generalized Gamma, Weibull, Lognormal and Box-Cox t were adjusted to the response variable. The adjusted distributions may have an asymmetric shape and leptokurtic, depending on the values assigned to their parameters. These distributions were selected through fitdist and histdist functions, which selected the best distribution from the raw data. On the features of these distributions, the generalized gamma distribution is a specific parameterization of the gamma distribution proposed by Lopatatzidis and Green, in which it has an additional parameter ν [8] [9]. The parameters μ and σ to the aforementioned distribution are associated with the mean and variance respectively. In the Box-Cox t (BCT) distribution the μ, σ, ν and τ parameters can be interpreted as scale (related to median), relative dispersion (associated with the coefficient of variation), asymmetry (processing power for symmetry) and kurtosis (degree of freedom), respectively. The lognormal distribution is a particular case of Box-Cox Cole Green (BCCG) distribution when the parameter ν = 0. Being the BCCG distribution a particular case of BCT, when the parameter for the number of degrees of freedom tends to  infinity, it results in a truncated normal distribution [12]. For the random effect, a normal distribution was proposed with zero mean and variance κ2. Table 3 presents fitted models to a vitamin intake data for older people and the estimate parameters. In Table 3, it was noted that the parameter estimate related to the median of the data for Box-Cox t distribution lies very close to the observed median from the data, and in some cases slightly lower (see Table 2), as it was expected for this parameter to be composed of a random intercept model. For other models in which that μ is a parameter related to the mean of the intake distribution, the estimate of this parameter from the adjusted model was also very close to the observed average (see Table 2). For instance, according to the gamma model adjusted for consumption of vitamin B1 in men, the average estimated from the model was 1.66 mg, while the average found in the database was 1.78 mg. It is also observed that in the Box-Cox t distribution, which has a robustness in the estimation process, the parameter for the number of degrees of freedom presents small estimates suggesting heavier tails to accommodate outliers. It is a very important fact to model intake data, which are very different. For example, according to the data from Vitamin A in men, the lowest consumption was 1.86 mcg, while the largest one was 41,372.02 mcg.
The Akaike criterion value was used to compare the NCI method for vitamin intakes and the asymmetric distributions. By this criterion, it was observed that there was no difference between the models, although most of the asymmetrical models had a lower value. It was not considered the adjustment of energy for the evaluated nutrients, since the Akaike values were not very different from those without adjustments. The results are shown in Table 4.
It is important to notice that the NCI method presented limitations for the adjustment of some vitamins as it uses a Box-Cox transformation type. When this transformation is not found, the result in MIXTRAN macro uses a logarithmic transformation to solve the problem, but this does not guarantee that this assumption can lead data to normal distribution. Asymmetric models also showed problems in the estimation of some models by the gamlss routine. When this problem happened, alternative distributions were selected through histdist and fitdistfunctions.

Discussion
Several methods to estimate the usual intake have been proposed in literature such as National Cancer Institute (NCI-considered as a standard method in this work), Multiple Source Method (MSM), Iowa State University (ISU) and Statistical Program for Age-adjusted Dietary Assessment (SPADE) [13] [14]. The authors carried out a simulated study with two 24-hour recalls with unequal number of subjects and they concluded that care should be taken if there is a high between-person variability or a high asymmetry.
The use of asymmetric distributions was already proposed but without taking in to account the between and  within-person variability [6]. In this case, some asymmetric distributions were proposed and, even without such variability, the results were good mainly to estimate the prevalence of inadequacy for the considered nutrients. Another class of functions, called Box-Cox symmetric class [12] has been developed in order to model such intake data with appropriate routines implemented by the author. Again, using the Akaike criteria, these proposed models presented, in general, lower values than the NCI method, indicating an improvement in the adjustment of the usual intake distribution of the vitamins. The energy adjust did not modify the results as well as the adjust for possible confounder variables.
In both methods (NCI and asymmetric models), it was observed that, due to the presence of high asymmetry and outliers, the variability of the random effect is high with or without energy adjustment. This means that the between-person variability exists and shows that there is not a nutrient pattern of consumption.
Therefore, this work shows that the adjustment of asymmetric models for nutritional intake data are effective and have, as an advantage, the direct calculation of the mean and median intake using fitted distribution without the need of a transformation, as made in the NCI method. Another advantage is the practicality of application, due to the tools presented in the gamlss routine. However, for future studies, it is intended to implement such asymmetric distributions to improve the issue of parameter estimation, and eliminate the problems related to the lack of convergence found in the used routine. Moreover, with such models implemented, it is possible to estimate the prevalence of inadequacy of nutrient intakes based on a cutoff point fixed a priori.

Conclusion
In this work, the use of asymmetric distribution to fit the nutrient intake distribution was proposed based on a random effect model in case of outliers and models with robust estimation procedure. An advantage of this new approach is that no data transformation is needed and the results can be interpreted directly from the estimated parameters. In addition, the inclusion of outliers is possible by means of robust procedures providing more plausible estimates.