Adaptive Fractional Polynomial Modeling in Sas ®

Regression predictors are usually entered into a model without transformation. However, it is not unusual for regression relationships to be distinctly nonlinear. Fractional polynomials account for nonlinearity through real-valued power transformations of primary predictors. Adaptive methods have been developed for searching through alternative fractional polynomials based on one or more primary predictors. A SAS macro called genreg (for general regression) is available from the author for conducting such analyses. It supports adaptive linear, logistic, and Poisson regression modeling of expected values and/or variances/dispersions in terms of fractional polynomials. Fractional polynomial models are compared using k-fold likelihood cross-validation scores and adaptively selected through heuristic search. The genreg macro supports adaptive modeling of both univariate and multivariate outcomes. It also supports adaptive moderation analyses based on geometric combinations, that is, products of transforms of primary predictors with possibly different powers, generalizing power transforms of interactions. Example analyses and code for conducting them are presented demonstrating adaptive fractional polynomial modeling.


Introduction
Regression analyses reported in the applied research literature commonly assume that relationships are linear in predictors without assessing this assumption.When addressed, nonlinear relationships are often based on standard po-lynomials, usually quadratic or cubic polynomials.However, these models cannot handle general nonlinearity.For this reason, Royston and Altman [1] proposed addressing nonlinearity with fractional polynomials based on power transforms of predictors using real valued powers.They proposed consideration of finite sets of powers.For example, they recommended the eight powers −2, −1, −0.5, 0, 0.5, 1, 2, 3 (with the 0 case corresponding to the natural log transform) for degree 1 fractional polynomials based on a single power transform (for more details on standard fractional polynomial modeling also see [2] [3]).Power transforms are also called Box-Tidwell transforms [4].Box and Cox [5] proposed a related set of transforms for transforming outcome (dependent or response) variables.Carroll and Ruppert [6] used these Box-Cox transforms to transform both outcomes and predictors.
The finite set of recommended powers can effectively model nonlinearity in many situations, but not always.Adaptive modeling generalizes standard fractional polynomial modeling to address the complete set of real valued powers.Knafl et al. [7] first formulated adaptive modeling in the Poisson regression case.This was extended to handle generalized linear models and modeling of variance/dispersions [8] and also to handle repeated measures modeling [9].Knafl and Ding [10] provided a general adaptive approach for fractional polynomial modeling of univariate and multivariate continuous, discrete, and count outcomes.
This paper describes adaptive fractional polynomial modeling and demonstrates the kinds of novel insights it can provide through example adaptive analyses of data from a study of family management by mothers of children with chronic physical conditions such as diabetes and Crohn's disease [11].These methods can be used in any area of the behavioral, health, and social sciences as well as any other application area that uses regression modeling.They can provide novel insights into the data not possible with standard regression methods.

Fractional Polynomials
Power transforms p X of a positive valued predictor X are well-defined for all real valued powers p.This can be extended to transforms ( ) , f X p for real va- lued predictors X by defining ( ) , f X p as p X for 0 X > , as 0 for 0 X = , and as ( ) X < where cos denotes the cosine function, π is the usual constant, and X is the absolute value of X.Note that the sign of ( ) , f X p when 0 X < oscillates between ±1 as p varies.Indicator predictor variables X with values 0 and 1 do not require transformation.
Fractional polynomials have the form ( ) ( ) ( ) where, if any of the predictors i X and j X are the same, then the associated powers i p and j p are not the same.An intercept can be included by setting 1 1 X = , that is, the unit predictor, with 1 1 p = .When X is positive valued and 0 p = , ( ) ,0 f X corresponds to the natural log transform ( ) log X if the model contains an intercept parameter (demonstrated by taking the limit as 0 p → ); otherwise ( ) ,0 f X corresponds to the unit transform, adding in an intercept to the model.Royston and Altman [1] also consider fractional polynomials to include interactions between transforms of predictors and their natural logs, but these are not considered here.

Geometric Combinations
Power transforms of interactions can be generalized to geometric combinations consisting of products of power transforms of multiple distinct predictors using possibly different powers.These have the form ( ) ( ) ( ) , , , where i X are distinct for 1 i m ≤ ≤ .They can replace standard terms ( ) , j j f X p in fractional polynomials.
Geometric combinations can be used to address the issue called moderation [12], also called effect modification, but generalized to handle nonlinearity.An adaptive model containing geometric combinations does not necessarily imply distinct moderation.That only holds if the model with the geometric combinations substantially improves (as defined later) on the associated additive model.

( )
, L ⋅ θ a likelihood function or a likelihood-like function such as the extended quasi-likelihood used with generalized linear models [13], and ( ) S θ the estimate of θ obtained by maximizing ( ) , L S θ over θ .The associated k-fold li- kelihood cross-validation score is given by ( ) ( ) ( ) ( ) In other words, the LCV score is the product of likelihoods for the data in the folds ( ) S h evaluated at parameter estimates computed with the data in the other folds ( ) \ S S h and normalized by the sample size.Fold assignment is random, but using the same initial seed for models of the same outcome Y.In this way, the same fold assignments are used for all such models, and then their LCV scores are comparable.
Larger LCV scores indicate better models, but not necessarily substantially (or distinctly) better models.This issue can be addressed using LCV ratio tests computed using the χ 2 distribution as for standard likelihood ratio tests.These tests are expressed in terms of a threshold for a substantial (distinct) percent decrease in the LCV score, which varies with the sample size.The formula is given M is also simpler, then it is pre- ferable as a parsimonious competitive alternative.LCV ratio tests are more conservative than standard tests for zero slopes in the sense that the removal from the model of a predictor with a significant slope can generate a competitive LCV score compared to the model with the predictor included (an example is provided later).
LCV scores should only be compared when computed for the same outcome variable Y.However, when Y is continuous and positive valued, it is possible to compute power-adjusted

( )
LCV q scores for power transforms q Y of Y (the formulation is given in Section 6.3 of [10]).The power 0 q = corresponds to the natural log transform ( ) log Y .These scores can be compared to choose a power transform for Y along with power transforms for its predictors.Estimated means for transformed Y can be inversed transformed to generate estimated means for untransformed Y (i.e., using the power 1 q for 0 q ≠ and the exponential transform for 0 q = ).

Adaptive Modeling Process
The adaptive modeling process is formulated in Chapter 20 of Knafl and Ding [10].An overview is provided here.A base model is first expanded by systematically adding in power transforms of primary predictors.The transform added to the model next is the one generating the best LCV score for primary predictors currently under consideration for inclusion in the model.Next, the expanded model is contracted, removing power transforms from the model and adjusting the powers of the remaining transforms to improve the LCV score.When the contraction leaves the expanded model unchanged, the powers of the expanded model transforms are adjusted to improve the LCV score.
The process is controlled by tolerance parameters indicating how much of decrease in the LCV score can be tolerated at given stages of the process.For example, the contraction stopping tolerance is set using a LCV ratio test, so that the final model is parsimonious.Primary predictors are dropped from consideration for inclusion in the expansion if the inclusion of their transforms decreases the LCV score by more than an associated tolerance parameter.Transforms are similarly dropped from consideration for removal in the contraction if their removal decreases the LCV score by more than the associated tolerance parameter.
The expansion process can optionally generate geometric combinations.Each power transform is multiplied together with the prior product of power trans-forms without adjusting the powers in those prior transforms, and so the final geometric combination is further power transformed to increase the LCV score.
Transforming the whole geometric combination with a single power simplifies the process compared to adjusting each of the powers in the geometric combination separately.
Fractional polynomial models can be generated for continuous outcomes using adaptive linear regression modeling with the identity link function (as demonstrated later), for dichotomous outcomes using adaptive logistic regression modeling with the logit link function, for ordinal outcomes using adaptive ordinal regression modeling with the cumulative logit link function, for nominal outcomes using adaptive multinomial regression modeling with the generalized logit link function, and for count/rate outcomes using adaptive Poisson regression modeling with the natural log link function.These outcomes can be univariate or multivariate.Fractional polynomial models are special cases of standard models, and so have the same assumptions.Furthermore, once the predictors and powers of a fractional polynomial are specified, efficiciency of estimation of associated slope parameters is the same as for standard models of the same type.
The adaptive modeling process can be applied to generate fractional polynomial models for only means (or expectations) assuming constant variances or unit dispersions.However, such assumptions need assessment.This can be conducted through adaptive modeling of both means and variances/dispersions using fractional polynomial transforms of primary predictors.The natural log link function is used for modeling the variances/dispersions.The assumption of constant variances or unit dispersions is reasonable if the associated adaptive model provides a competitive alternative to the adaptive model for both means and variances/dispersions in combination.

Computational Support
The adaptive modeling process has been implemented in the genreg (for general regression) SAS ® (SAS Institute, Inc., Cary, NC) macro, which supports adaptive linear, logistic, and Poisson regression modeling of either univariate or multivariate continuous, dichotomous, ordinal polytomous, nominal polytomous, and count/rate outcomes.The ypower SAS macro is available for generating power transforms Y q of positive valued continuous outcomes Y as well as power transforms of their predictors.It uses a grid search to choose the power q.These macros report in their output the threshold for a substantial percent decrease in the LCV score.Knafl & Ding [10] provide extensive coding details on the use of both macros.Section 4 describes code for generating the adaptive analyses reported here.

Example Data
Example adaptive analyses are conducted using data from a study of family management by parents of children with chronic physical conditions such as diabetes and Crohn's disease [11].Data are available for both mothers and fathers, but only the data for 344 partnered mothers and for 65 single mothers are used in the example analyses.The child's functional status (CFS) is measured using the Functional Status II [14] assessing the child's health status.Higher scores indicate better child functioning.The effort to manage the child's condition and the parental mutuality (PM) in managing the condition are measured using scales of the Family Management Measure [11].Higher effort scores indicate lower levels of family management of the condition while higher PM scores indicate better levels.PM is only collected for partnered parents.There are no missing data values for all reported analyses.Institutional Review Board approval was obtained for the original study and also for conducting secondary analyses of these data including the analyses reported in this manuscript.

Results
The threshold for a substantial percent decrease in the LCV score for analyses of data for the 65 single mothers is 2.91% while it is 0.56% for analyses of data for the 344 partnered mothers.LCV scores are computed with 10 k = folds unless otherwise indicated. .Moreover, the constant model has LCV score 0.058412 with insubstantial percent decrease 0.50% (i.e., less than the threshold of 2.91% for the data), and so is a parsimonious, competitive alternative.F tests for the quadratic and cubic polynomial models are also nonsignificant, and these generate inferior LCV scores.In this case, a standard polynomial assessment of possible nonlinear dependence of effort on CFS leads to the conclusion that mean effort does not depend on CFS.

Adaptive Analyses of Data for Single Mothers
Some authors feel that higher order integer power transforms should not be included in models without also including lower order integer powers [15].
To assess this issue for the single mother data, the model in only CFS 2 has LCV score 0.058467 and so improves on the full quadratic polynomial model with LCV score 0.058188 (Table 1).In this case, the percent decrease 0.48% is insubstantial, but the model with only the quadratic term is simpler while also providing an improvement.On the other hand, the model in only CFS 3 has LCV score 0.058269 while the full cubic polynomial model with LCV score 0.054129 (Table 1) generates a substantial percent decrease of 7.10%.In this case, insisting on inclusion of lower order terms makes the model highly ineffective.
An adaptive expansion generates the model based on CFS −1 plus an intercept.

Adaptive Analyses of Data for Partnered Mothers
The model for mean effort for partnered mothers as a linear function of CFS has LCV score 0.061785 and provides a substantial improvement over the constant model with LCV score 0.057434 and percent decrease 7.04% (i.e., larger than the threshold of 0.56% for the data).Moreover, mean effort significantly decreases as would be expected with increased CFS (estimated slope −0.14, ( ) ).On the other hand, the adaptive model for mean effort restricted to include at most one transform of CFS is based on CFS 6 with an intercept.The LCV score for this model is 0.063383, providing a substantial improvement over the linear model in CFS with percent decrease 2.52%.Consequently, mean effort is distinctly nonlinear in CFS.Moreover, the best LCV score for the eight Royston and Altman powers for degree 1 fractional polynomials is 0.062620 achieved with CFS 3 and with substantial percent decrease 1.20%.This is an example using actual, nonsimulated data where restricting to the Royston and Altman recommended powers generates a substantially inferior model compared to using the adaptive modeling process.
Using the adaptive modeling process applied to the means allowing for multiple transforms of CFS with constant variances generates the model for the means based on three transforms: CFS 6 , CFS 2.8 , and CFS −2.5 without an intercept and LCV score 0.064067.Under this model, mean effort (plot not provided) decreases from the lowest observed CFS value of 42.9 up to CFS 65.4, then increases up to CFS 82.1, and then decreases after that up to the largest observed value for CFS of 100.
The purpose of the parent study [11] that collected these data was to develop an instrument measuring family management of childhood chronic conditions.
CFS was collected for construct validity purposes, and it was hypothesized that negative measures of family management like effort would decrease with increased CFS.Thus, a monotonically decreasing relationship had been previously hypothesized, and so the generated nonmonotonic adaptive model was counter-intuitive.
One way to assess monotonicity of mean effort in CFS is to restrict to a single power transform of CFS, but that is the model based on CFS 6 with LCV score 0.063383 and substantial percent decrease 1.07%.On the other hand, restricting the contraction not to remove the intercept generates the model based on the two transforms CFS 5 and CFS 6.12 with LCV score 0.063755 and insubstantial percent decrease 0.49%.Moreover, under this competitive model, mean effort is monotonically decreasing in CFS as would be expected.An adaptive search over power transforms for effort along with power transforms of max(CFS,90) for modeling mean transformed effort with constant variances identifies the power 1.5 q = with the best power-adjusted score ( ) LCV 1.5 0.064814 = .This is a substantial improvement over the model for untransformed effort with power-adjusted score ( ) LCV 1 0.063885 = (the same as its non-power-adjusted LCV score) and substantial percent decrease 1.43%.Mean effort 1.5 is based on the one transform: max(CFS,90) −4 without an intercept.The adaptive model for the means and variances in max(CFS,90) provides an improvement with larger LCV score 0.064897 but the associated constant variances model generates an insubstantial percent decrease 0.13%.Consequently, mean effort 1.5 is reasonably considered to have variances constant in max(CFS,90).
Figure 3 contains the plot of the raw data and the predicted value curve for mean untransformed effort as a function of bounded CFS generated by the adaptive constant variances model for effort 1.5 (obtained by inverse transforming estimated mean effort 1.5 using the power 1 2 3 q = . Mean effort is constant at 15.8 up to CFS 90 and then decreases to 11.9 at CFS 100. Standardized residuals for transformed effort range from −2.77 to 2.43, and so there are no extreme outliers and the skewness has been reduced.Figure 4 contains the associated normal (probability) plot, which is reasonably close to linear.Adaptive transformation of effort has resolved model assumption problems, including outliers, skewness, and nonconstant variances.

Adaptive Moderation Analyses of Data for Partnered Mothers
Mean untransformed effort decreases significantly with untransformed max (CFS,90) assuming constant variances (estimated slope −0.42, ( ) ; estimated standard deviation 3.8).Whether this effect is moderated by PM can be addressed with a standard linear moderation analysis by adding PM and the interaction max(CFS,90)•PM into the model.The slope for this interaction is nonsignificant ( ( )   Figure 5 contains predicted value curves for mean effort as a function of max(CFS,90) for selected values of PM.For CFS up to 90, mean effort is constant, at lower levels with increased PM, and with less of a decrease with increasing levels of PM.For CFS from 90 to 100, mean effort decreases with increased CFS close to linearly, at lower levels and with steeper rates with increased PM, and with less of a change with increasing levels of PM.Assessments of moderation using transformed effort and nonconstant variances as well as of model assumptions are needed but are not addressed here for brevity.

Simulation
An example was presented earlier where the recommended set of degree 1 powers had too restrictive a range to effectively identify a power well outside of that range.Knafl and Ding [10] (Section 2.12) provide a simulation demonstrating that this holds more generally.The true power for the simulated data was −7.
The adaptively generated power was −6.9, and the model based on the true power was a competitive alternative to the one based on the estimated power.The best recommended power was −2, which generated a very substantial percent decrease of 55.8% compared to a threshold of 0.19% (due to a sample size of 1001 observations).
However, the issue of whether the recommended set of powers can be ineffective when the true power is in between the extreme values of −2 and 3 has not yet been addressed.For that reason, the following simulation was conducted.
The predictor xsim had 101 equally spaced values from 0 to 1, and so 0.01 units apart.The outcome ysim satisfied ( ) where  was normally distributed with mean 0 and standard deviation 0.01.
The normalizing constant 1.5 was set to the maximum value for the numerator rounded to 1 decimal digit so that ysim values were bounded by 1.This meant that after rounding the true intercept was 0.33, the true slope 0.67, and the true standard deviation 0.0067.These data are plotted in Figure 6.The threshold for a substantial percent decrease in the LCV score for these data was 1.86%.

Overview of Adaptive Fractional Polynomial Modeling in SAS
The genreg and ypower macros can be loaded into SAS using %include state- Assume that a data set with name partnered has been created in the SAS default library.Also assume that this data set contains the variables named effort, CFS, and mutuality containing values for the effort to manage the child's chronic condition, the child functional status, and parental mutuality in managing the condition, respectively, for 344 partnered mothers.

Standard Regression Modeling
A standard linear polynomial model for mean effort as a linear function of CFS with constant variances is generated as follows. %genreg(modtype=norml,datain=partnered,yvar=effort,xvars=CFS,procmod=y); The genreg macro is invoked by attaching a percent sign (%) to its name, followed by a list of settings for its macro parameters separated by commas and in parentheses.The modtype parameter determines the likelihoods used to generate parameter estimates and LCV scores.In this case, the value "norml" means that the outcome is continuous and to be analyzed using linear regression models and likelihoods based on the normal distribution.Other choices include "logis" for discrete outcomes and "poiss" for count outcomes.The datain parameter provides the name of the data set to use in the analysis, in this case the partnered data set.The yvar parameter names the outcome (or y) variable to be the varia-ble effort.The xvars parameter provides a list of names for predictor (or x) variables, in this case only the single predictor CFS.The procmod parameter is used to generate SAS output for the appropriate SAS procedure (or PROC), in this case PROC REG.The value "y" is short for yes.The other possible value is "n" for no.
All other macro parameters take on their default values (as specified in the genreg code).For example, the default value for the xintrcpt parameter is "y" indicating that the model for the means should include an intercept parameter.
Consequently, the model generated in the above code is the standard linear polynomial model for mean effort with an intercept.The foldcnt macro parameter is used to set the number of folds.In this case, its default value of "10" is used.
In the above code, the xpowers macro parameter also has its default empty setting meaning not to transform the xvars variables (or equivalently, transform them with power 1).To generate a standard quadratic polynomial model in CFS use the settings "xvars=CFS CFS" and "xpowers=1 2".

Adaptive Regression Modeling
An adaptive model for mean effort as a function of at most one transform of CFS can be generated as follows.
%genreg(modtype=norml,datain=partnered,yvar=effort, expand=y,expxvars=CFS, multtrns=n,contract=y); The base model for this analysis is the constant model for mean effort due to the default settings "xintrcpt=y" and "xvars=" (i.e., the empty setting for xvars meaning include no predictors).The setting "expand=y" requests that this base model be expanded by including power transforms of the variables listed in the expxvars parameter setting, in this case only the variable CFS.The setting "multtrns=n" means that multiple transforms of the expxvars variables are not to be included in the expanded model.The setting "contract=y" means contract the expanded model, adjusting powers for remaining transforms with each removal of a transform.In this case, the base constant model is expanded to include the single transform of CFS with power 6.The contraction leaves the expanded model unchanged.
To allow for multiple transforms of CFS in the model, change to "multtrns=y", which is the default setting; so removing "multtrns=n" from the code has the same effect.In this case, the base constant model is expanded to include three transforms of CFS with powers 6, 5, and −2.5 in that order.The contraction removes the intercept, adjusts the three powers to 6, 2.8, and −2.5, and then stops.
This model can be generated directly as follows.
%genreg(modtype=norml,datain=partnered,yvar=effort,xintrcpt=n, xvars=CFS CFS CFS,xpowers=6 2.8 −2.5); By default, the contraction considers removal of the intercept from the model for the means.This is controlled by the nocnxint (for no contraction of the x in-Open Journal of Statistics tercept) macro parameter with default setting "n".Add "nocnxint=y" to the above code to restrict the contraction not to remove the intercept.As before, the base constant model is expanded to include the three transforms of CFS with powers 6, 5, and −2.5, but now the contraction removes the transform with power −2.5 rather than the intercept, adjusts the other two powers to 5 and 6.12, and then stops.
This model (see Figure 2) suggests that mean effort is constant up to the value 90 for CFS.This can be assessed by generating the adaptive model in the variable bnddCFS, defined to equal CFS bounded to be at least 90, assuming this variable has been added to the partnered data set (using the code "bnddCFS=max (CFS,90);").The adaptively generated model in bnddCFS provides a competitive alternative to the one in unbounded CFS and so bnddCFS is used in subsequent analyses instead of CFS.

Adaptive Variance Modeling
The genreg macro also supports adaptive modeling of the variances (or the dis- The first two pages of the genreg output contain details on macro parameter settings, including the value for the threshold for a substantial percent decrease in the LCV score, rounding in this case to 0.56%.The third page contains results for the base model.Part of this output is in Table 2.The sample size in number of measurements denoted by m was 344.The estimated constant mean was 13.5. The estimated log of the constant variance was 2.86 so that exponentiating and taking the square root gives an estimated standard deviation of 4.18 (this value is reported in the output, but that has not been included in Table 2).The last entry Open Journal of Statistics described as the "mth root of the likelihood using deleted predictions" is the LCV score, which equals 0.057434.
Page 4 of the output describes settings of macro parameters controlling the expansion.Page 5 describes the expanded model.Part of this output is in Table 3.The expansion first adds the transform of bnddCFS to the power 6 to the model for the means (order = 1), next it adds the power transform of bnddCFS

Outcome Transformation
The nonconstant variances model in bnddCFS provides a substantial improvement in the LCV score over the associated constant variances model (as reported earlier), indicating that the usual assumption of constant variances is not appropriate for these data.This model also has extreme outliers and the residuals are distinctly skewed.Outcome transformation has the potential to remedy such problems.This is addressed using the ypower macro.
The following code requests generation of power adjusted LCV(q) scores for adaptive modeling of mean transformed effort as a function of bnddCFS over a grid of powers q.
%ypower(datain=partnered,yvar=effort,yfst=−2.5,ycnt=11,ystp=0.5, expand=y, expxvars=bnddCFS,contract=y); The datain, yvar, expand, expxvars, and contract macro parameters have the same meanings as for the genreg macro.The requested values for the outcome powers q range from −2.5 to 2.5 by steps of size 0.5.The first value for q is set to −2.5 by the yfst macro parameter.The step size of 0.5 is set by the ystp macro parameter.The ycnt macro parameter determines how many powers to consider; 11 in this case so that the last power is 2.5.Only constant variances were considered to reduce the computation times.The ypower macro invokes the genreg macro to generate the adaptive model for mean transformed effort in terms of bnddCFS for each requested power q, then uses genreg output to compute associated LCV(q) scores.Part of the output for the above code is provided in Table 5.The best LCV(q) score of 0.064814 is generated by the power 1.5.A grid search around 1.5 over multiples of 0.1 from 1.1 to 1.9 can be generated by changing to the settings "yfst=1.1","ystp=0.1",and "ycnt=9".The best score is still generated by the power 1.5.
The adaptive model for both the means and variances of effort transformed by the power 1.5 is generated with the following code.The settings "yfst=1.5"and "ycnt=1" guarantee that only the power 1.5 is considered.The nogprint macro parameter requests that genreg output be generated along with ypower output.The default setting is "nogprint=y", indicating not to generate any genreg output.

Adaptive Moderation
Assume the variable interact has been added to the partnered data set (using code "interact=bnddCFS*mutuality;").A standard linear moderation model can be generated as follows.
The adaptive additive model in bnddCFS and mutuality is generated as follows.
%genreg(modtype=norml,datain=partnered,yvar=effort,expand=y, expxvars=bnddCFS mutuality,contract=y); The generated model for the means is based on bnddCFS transformed by the power 6 and mutuality transformed by the power −0.The setting "geomcmbn=y" requests that the expansion consider geometric combinations.The default setting is "geomcmbn=n", which restricts to additive models.The generated model is described earlier and can be generated directly

Discussion
Consideration of general power transforms of a primary predictor may be required to identify that a relationship holds between that predictor and an outcome.For example, standard linear, quadratic, and cubic polynomial models for mean effort for single mothers as a function of child functional status (CFS) suggested that mean effort was constant in CFS.However, the adaptive model identified a distinctly nonlinear relationship (Figure 1) between mean effort and CFS.Moreover, this was only possible with consideration of zero intercept models.
Consideration of general powers can identify distinctly better models than restricting to the Royston and Altman recommended set of powers.For example, mean effort for partnered mothers depended on CFS 6 which provided a substantial improvement on the best recommended degree 1 power transform CFS 3 for this case.
Adaptive fractional polynomial modeling can identify distinctly nonlinear relationships even when there is a significant linear relationship.For example, mean effort for partnered mothers was significantly related to untransformed CFS, but adaptively transforming CFS generated a substantially better model based on a nonlinear relationship.This model was counter-intuitively not monotonically decreasing in CFS.However, the model generated by restricting the adaptive search not to remove the intercept was competitive monotonic alternative (Figure 2), indicating that monotonicity was a reasonable conclusion for these data.

Bounding Predictors
The model of Figure 2 suggested that mean effort was constant in CFS up to a value of 90 and then decreased after that, and the associated bounded CFS model provided a competitive alternative.Bounded CFS is a special case of splines as often used in nonparametric modeling, for example, in multiple adaptive regression splines (MARS) [16].Its applicability to these data provides the useful insight into the relationship between mean effort and CFS that effort can only be improved (lowered) by increasing CFS to relatively high levels.
CFS might not lend itself well to intervention, but suppose mean effort was constant in low values of a measure amenable to intervention like family func- tioning.An intervention that produces only small to moderate improvements in family functioning would only be beneficial for families with relatively high levels of family functioning to start with and so would have too low of a scientific premise to justify an efficacy study.On the other hand, suppose mean effort decreased distinctly for improvements in family functioning at low to moderate levels but was reasonably considered to be constant for high levels of family functioning.If a study to improve family functioning included substantial numbers of families with high levels of family functioning, the intervention would have little effect on mean effort for these families, and the efficacy of the intervention would likely not be supported.However, if having a low to moderate level of family functioning was an inclusion criterion, the intervention would be likely to produce distinct reductions in effort supporting its efficacy.

Zero Intercept Models
Models commonly include an intercept even when it is nonsignificant, and so zero intercept models may seem inappropriate to consider.However, these models are quite simple to understand in the continuous outcome context; they mean that the mean outcome is zero when the predictors all have value zero.
Moreover, they can generate competitive alternatives to nonzero intercept models more parsimoniously, and substantially better models in some cases (as demonstrated in the example analyses).However, the adaptive modeling process can be constrained not to remove the intercept, and the associated adaptive models can provide useful insights into the data (as in the above example supporting an expected monotonic relationship).

LCV Ratio Tests
LCV ratio tests can be more conservative than standard tests for zero coefficients.
For example, mean effort depended significantly on CFS transformed with the power −1, but the associated LCV score was not substantially better than the score for the constant model.This is not an isolated case.For example, for the 21 significant cases considered in [17], the constant model was a competitive alternative using LCV scores in 5 (23.8%) of these cases; this also held for 10 (83.3%) of the 12 significant cases considered in [18] and for 12 (56.1%) of 21 significant cases considered in [19].This is an important property of LCV scores, indicating that adaptive modeling usually generates more parsimonious models than model selection procedures based on standard p values, thereby reducing the chance of overfitting.Knafl and Ding [10] provided the following partial justification for why this would hold in general.Model selection based on LCV scores is asymptotically equivalent to model selection based on Akaike information criterion (AIC) scores [20] as long as these are appropriately transformed into larger is better scores.Associated AIC ratio tests use more conservative thresholds for significance than standard likelihood ratio tests.Since the recommended approach for multiple fractional polynomial modeling [2] [3] is based on standard likelihood ratio tests, it would be more likely to produce more complex models and with more of a potential to overfit the data.

Adaptive Moderation
Adaptive moderation modeling can identify moderation effects not identifiable using linear moderation modeling.Moreover, consideration of geometric combinations may be needed to identify nonlinear moderation compared to only considering power transforms of interactions.For example, mean untransformed effort for partnered mothers depended significantly on untransformed bounded CFS, but including parental mutuality (PM) in the model along with its interaction with bounded CFS generated a nonsignificant slope for the interaction.An adaptive analysis considering transforms of the interaction (and so with bounded CFS and PM raised to the same power) suggested that nonlinear moderation did not hold.However, consideration of geometric combinations (and so with different powers for bounded CFS and PM) identified nonlinear moderation of the dependence of mean effort on bounded CFS by PM (Figure 5).
Linear moderation of an effect to a predictor X on a continuous outcome Y by a third variable Z can be addressed by also including Z as a covariate and its interaction X Z ⋅ in the model for the mean of Y and then testing for a zero slope for the interaction.An alternative approach is to test for a significant change in R 2 for the model with the interaction compared to the covariate model including X and Z without the interaction using an F test.In the adaptive nonlinear setting, the first approach based on a test for a zero coefficient cannot be extended, but the other approach can be.First, generate the adaptive model allowing for separate transforms of X and Z along with geometric combinations in X and Z.If the generated model contains no geometric combinations, then adaptive moderation does not hold.However, if the generated model does contain geometric combinations, adaptive moderation might still not hold.In that case, also generate the adaptive additive model in X and Z, that is, only considering separate transforms of X and Z without geometric combinations.Adaptive moderation holds if this adaptive additive model generates a substantial percent decrease in the LCV score compared to the model allowing for geometric combinations (as was the case in the example analyses).

Outcome Transformation
Consideration of power transforms of positive valued continuous outcomes as well as of predictors can generate distinctly better models and can resolve model assumption problems.For example, variances for untransformed effort for partnered mothers were distinctly nonconstant, and the associated adaptive model generated skewed standardized residuals with two extreme outliers.On the other hand, modeling mean effort 1.5 as a function of bounded CFS generated a model (Figure 3) that provided a substantial improvement over not transforming effort with variances reasonably treated as constant, without outliers, with reduced skew, and with a reasonably linear normal plot (Figure 4).However, this might not always be the case.The need for nonconstant variances is not a problem because variances can be appropriately modeled using adaptive methods.If there are still outliers after outcome transformation, sensitivity analyses can be conducted removing outliers systematically to assess whether the outliers are highly influential on what models for the means and/or variances are generated.
High levels of skewness may not be resolved through power transforms of a positive continuous outcome.In such cases, the outcome can be categorized into ordinal levels.Conventional categorizations of the outcome can be used if available.If not, percentile splits could be used.For example, the outcome could be categorized based on a median split and then modeled using adaptive logistic regression.Alternatively, the outcome could be categorized based on a tertile or quartile split and modeled using adaptive ordinal regression.

Testing for a Bivariate Relationship
Suppose a relationship has been hypothesized between an outcome Y and a predictor X.The conventional approach for assessing this involves testing for a significantly nonzero slope for X using a bivariate regression model (or equivalently for a zero correlation).If this test is nonsignificant, a nonlinear relationship still might hold between Y and X.This can be assessed adaptively using an expansion of a constant base model restricted to at most one transform of X.If the expansion leaves the constant model unchanged, then a nonlinear relationship does not hold between Y and X.However, if the generated model does contain a transform of X, the heuristics of the expansion do not guarantee that this relationship would be significant.This could be assessed with a standard test for zero slope for transformed X.Even if this test is significant, the relationship might not be substantial, which could be addressed using a LCV ratio test comparing the model with transformed X to the constant model.Equivalently, the expanded model could be subjected to a contraction holding the intercept fixed in the model.The relationship is substantial if the contraction does not remove transformed X from the model.

Applicability
Linear relationships are typically assumed or implied in theories used in the be- The reported analyses used data from the family psychology literature, but adaptive methods apply more generally to all areas providing deeper insights into relationships between arbitrary outcomes and their continuous predictors.
Adaptive methods could be used to conduct purely exploratory analyses similar to those reported here to identify nonlinear dependence of outcomes on individual transformed predictors and/or on multiple transformed predictors in combination and possibly interacting (through geometric combinations).
However, they can also be used to supplement standard analyses based on theoretical considerations.For example, suppose a pretest/posttest experiment was conducted with participants randomized to either an intervention to reduce depressive symptoms (or to affect any other appropriate outcome) compared to a control condition.Suppose that the pre-specified approach to assess the effica- and all available covariates as primary predictors, then contract the expanded model adjusting powers of remaining transforms but restricting the contraction not to remove the intervention group indicator (or also the intercept).Because that indicator was included in the model based on theory, it would be reasonable to use a standard test for zero slope to assess whether the intervention effect still held after accounting adaptively for transforms of the baseline value and the covariates.This kind of modeling could be considered semi-adaptive.
If a quasi-experimental or comparative design was used instead with two nonequivalent groups, the issue of covariates becomes even more important.
One possibility would be to conduct an adaptive logistic regression analysis modeling treatment group membership in terms of available covariates and use the generated adaptive model to compute propensity score weights or scores for models assessing the efficacy of the intervention.
Suppose that depressive symptoms (or any other appropriate outcome) were collected at baseline and also longitudinally at multiple post-baseline times and that the efficacy of the intervention was supported by a significant group-by-time interaction using a standard repeated measures analysis.A parsimonious model of how mean depressive symptoms changes over time separately within each of the two groups can be addressed using adaptive linear mixed modeling allowing for temporal correlation.Conduct an adaptive moderation analysis with time and the intervention group indicator as primary predictors while allowing for geometric combinations in these two predictors to account for different nonlinear trajectories for the two groups.In this case, geometric combinations would be the same as power transforms of the interaction since indicator variables are unaffected by power transformation.
Any standard model used to establish the effect of an intervention on a continuous outcome might have nonconstant variances or the outcomes may be skewed, making the conclusion that the intervention was efficacious questionable.This could be assessed through adaptive modeling allowing for nonconstant variances and/or outcome transformation.The model for the mean could be left unchanged so it would be theory-based or it could be adjusted as above for covariates.For discrete or count outcomes, the need for nonunit dispersions could be assessed using adaptive modeling comparing the model allowing for nonunit dispersion to the associated unit dispersions model, once again holding a theory-based model for the means fixed or adjusting it for covariates.

Comparison to Standard Fractional Polynomial Modeling
The example analyses demonstrate that the range of the recommended powers can be too restrictive when the appropriate power is outside that range.This is further supported by a simulation provided by Knafl and Ding [10] (Section 2.12).A simulation was also provided here that demonstrates this can also happen when the true power is strictly within the range of recommended powers.
On the other hand, the recommended set of powers will often generate a competitive choice for actual data sets compared to the associated adaptive model.
When the selected recommended power has an extreme value of −2 or 3, it would be prudent to expand the search through more extreme powers.In any case, when the data have little variability, it would also be prudent to expand the search over powers nearby the selected recommended power. .Mediation [12] has not been addressed here.That requires a more complex approach based on adaptive path modeling [22].
Models based on arbitrary power transforms can be difficult to interpret, especially geometric combinations based on multiple powers (as in the example analyses).Selected powers have little meaning since competitive models could be generated by replacing those selected powers by many alternative, nearby powers.
Adaptive models can only be fully understood through visualization of estimated relationships.Figures 1-3 give examples of visualizing models based on single predictor transforms.Figure 5 provides an example of visualizing a model based on two predictor transforms (i.e., plot the mean outcome as a function of one of the predictors for selected values of the second predictor).Three predictor transforms could be handled by several plots like Figure 5, visualizing the relationship in two of the transforms at selected values of the third transform.Models based on more than three transforms would be difficult to visualize.Adaptive modeling can be constrained to include at most three transforms in order to guarantee that the generated model could be visualized.
The possibility of overfitting is of concern for all regression models.The use of LCV scores reduces the possibility of overfitting by an adaptively selected model.Overfitting results when a model relies too heavily on small exceptional subsets of the data (e.g., outliers), but such subsets are likely to fall mostly, if not entirely, in one fold.While such a model will fit that fold very well, it is unlikely to fit the other folds well, thereby generating an inferior LCV score compared to models not overfitting the data.However, the possibility of overfitting should still be assessed.For example, Figure 3 is the plot for the final selected model for mean effort as a function of CFS for partnered mothers.The smallest observed value for CFS is 42.9, more than 7 units less than the next smallest observed value of 50.However, Figure 3  Other methods that are directly supported in standard statistical software could have been used instead, including classification and regression trees (CART) [23], generalized additive models (GAMs) [24], and MARS models.
Fractional polynomial modeling is likely to generate smoother curves than these other methods due to the differentiability of power transforms.However, these other methods do not address nonconstant variances or outcome transformation.
Knafl and Ding [10] provided comparisons of adaptive modeling to the use of GAMs and MARS, demonstrating that adaptive modeling can generate improved LCV scores using more parsimonious models and sometimes distinctly better models.In none of those example analyses did GAMs outperform adaptive modeling.However, in one exceptional example analysis, MARS modeling outperformed adaptive modeling, but that MARS model could be improved through adaptive adjustments (i.e., by power transforming MARS generated splines).

Summary
Adaptive fractional polynomial modeling has been described and demonstrated.
Reported analyses demonstrate the kinds of novel insights into the data that are possible with adaptive modeling.Specifically, fractional polynomials can outperform standard polynomials and consideration of only standard polynomials can lead to the incorrect conclusion that a relationship does not hold when in fact a nonlinear relationship exists.Zero intercept models are important to consider because the associated models are simpler and relationships in some cases can only be supported allowing for a zero intercept.While the recommended Royston and Altman powers will often generate effective models, there are cases where the recommended powers are inadequate to effectively model the data.
This had only been demonstrated for powers outside the range of the recommended powers before using simulated data, not actual data as demonstrated in the example analyses.Furthermore, the example simulation analyses demonstrate for the first time that this also holds when the true power is strictly within the range of the recommended powers.Allowing for nonconstant variances can provide distinct improvements over making the usual constant variances assumption and provides an objective way to assess that assumption as opposed to subjective inspection of residual plots.Power transforms of positive valued continuous outcomes can provide distinct improvements over modeling untransformed outcomes.Basing the choice on power-adjusted LCV scores provides an objective way to choose an appropriate outcome power as opposed to subjectively inspecting scatter plots for selected power-transformed outcomes.Nonlinear moderation relationships can hold when linear moderation does not hold.
Moreover, this can sometimes only be identified using general geometric combinations rather than standard interactions.The reported analyses provide the first known example of this.
In summary, adaptive modeling can provide novel insights into data com- the observations of a data set.Partition these indexes, and hence the data, into k disjoint subsets ( )S h for 1 h k ≤ ≤ ;these subsets are called folds.Let θ denote the vector of model parameters,

Figure 1 .
Figure 1.Raw data and predicted value curve for the adaptive model for mean effort as a function of child functional status for single mothers.

Figure 2
Figure 2 contains the plot of the raw data and the predicted value curve for

Figure 2 .
Figure 2. Raw data and predicted value curve for the adaptive model for mean effort as a function of child functional status for partnered mothers based on untransformed effort and restricted to include an intercept.
that linear moderation does not hold.The adaptive additive (i.e., without considering interactions) model for the means in max(CFS,90) and PM depends on max(CFS,90) 6 and 0.1 PM − without an intercept and with LCV score 0.064201.The adaptive model in only max(CFS,90) is a competitive alternative with LCV score 0.063885 (as reported earlier) and insubstantial percent decrease 0.49%.

Figure 3 .
Figure 3. Raw data and predicted value curve for the adaptive model for mean effort as a function of bounded child functional status for partnered mothers based on transformed effort.

Figure 4 .
Figure 4. Normal plot generated by standardized residuals for the adaptive model for mean transformed effort as a function of bounded child functional status for partnered mothers.

Figure 5 .
Figure 5. Predicted value curves for the adaptive model for mean untransformed effort as a function of bounded child functional status moderated by parental mutuality for partnered mothers.
Using 10 k = folds, the adaptive model for mean ysim with constant variances was based on the true power 1.5.Rounded estimated values for the intercept, slope, and standard deviation were 0.33, 0.67, and 0.0067, respectively; all equal to the true values.The LCV score was 34.8004.The best LCV score for the recommended powers was generated by the power 2 with value 10.0569 and very substantial percent decrease of 71.1%.These results demonstrate that the set of recommended powers in between the extreme powers of −2 and 3 can also produce ineffective models when the variability in the data is small, as for these simulated data.

Figure 6 .
Figure 6.Simulated data with mean of the simulated y variable a function of the simulated x variable raised to the power 1.5.
ments.The version is indicated by the date in the name of the macro.The version of genreg used in reported analyses was created on 3/19/2017 and the version of the ypower macro on 2/25/2017.These versions of the macros as well as code to generate reported analyses are available at http://www.unc.edu/~gknafl/AFPM2.html(accessed February 4, 2018).
persions for logistic and Poisson regression) in terms of fractional polynomial models.The log of the variances is modeled in terms of fractional polynomials to guarantee that the variances are positive valued.At each step of the expansion, a best transform is identified for expanding the model for the means and also for the model for the variances.The choice with the better LCV score is added next to the model.Similarly, at each step of the contraction, the next term removed is the one for either the means or the variances generating the better LCV score.An adaptive model for both means and variances of effort in bnddCFS is requested as follows.%genreg(modtype=norml,datain=partnered,yvar=effort,expand=y,expxvars=bnddCFS,expvvars=bnddCFS,contract=y);The expvvars macro parameter requests that the expansion and contraction also consider transforms of bnddCFS for modeling the variances.There are several parameters like expvvars for modeling the variances (or the v component) analogous to those for modeling the means (or x component) like expxvars.For example, the vintrcpt and vvars parameters control the base model for the variances with default settings "vintrcpt=y" and "vvars=", thereby generating the constant variances model.Consequently, the above code requests a base model with constant means and variances.

to the power 3
to the model for the variances (order = 2), and then stops.The LCV score rounds to 0.064183.Page 6 of the output describes settings of macro parameters controlling the contraction.Page 7 describes the contracted model.Part of this output is in Table 4.The contraction removes the intercept from the model for the variances (order = 1), adjusts the power for bnddCFS in the model for the variances to 1.3, leaves the power for bnddCFS in the model for the means unchanged at 6, and then stops.The LCV score improves to 0.064329.
cy of the intervention was to model mean post-baseline depressive symptoms in terms of the indicator for being in the intervention group while controlling for baseline depressive symptoms.If the intervention group indicator has a significant effect in this theory-based model supporting the efficacy of the intervention, the next natural issue to address is how that conclusion is affected by controlling for covariates.If one includes all available covariates in the model and the intervention group effect becomes nonsignificant, this could be a meaningful conclusion or alternatively a consequence of overfitting the model with too many nonsignificant covariate terms.This issue could be addressed adaptively as follows.Conduct an adaptive analysis starting from the model for the means based on the intervention group indicator by itself, expand the model considering baseline depressive symptoms indicates that neither of these two data points affects estimated mean effort values over the associated range of CFS values very much.These estimates are controlled more by the next smallest observed CFS value of 53.6 with five associated observations.Had the estimated curve at CFS = 42.9 been very close to the associated observed effort value of 20, the model could have overfit the relationship at the low end of CFS values, especially if its slope changed quite a bit soon after that.

Table 1 .
Standard polynomial modeling of the outcome effort as a function of the primary predictor child functional status for single mothers.

Table 2 .
Part of the output describing the constant base model for mean effort with constant variances for partnered mothers.

Table 3 .
Part of the output describing the expanded model for both means and variances of effort as a function of bounded child functional status (bnddCFS) for partnered mothers.

Table 4 .
Part of the output describing the contracted model for both means and variances of effort as a function of bounded child functional status (bnddCFS) for partnered mothers.

Table 5 .
Part of the output for choosing an outcome transformation for effort with its means a function of bounded child functional status (bnddCFS) for partnered mothers.
1 without an intercept.The adaptive model in bnddCFS, mutuality, and interact is generated as follows, but it is the same as the adaptive additive model.