Considerations on the Use of Approximations to Estimate Freight Demand on the Basis of a Single Explanatory Variable with Non-Linear Functional Forms and Generalization to Statistical Modelling ()
1. Introduction
The estimation of urban freight demand is a major issue in urban and transport economics, as shown in recent studies (Holguin-Veras et al., 2019; Ramfou & Sambracos, 2023; Gonzalez-Feliu, 2024). In contexts where data are aggregated or inaccurate, it seems suitable to propose approximations to use the most suitable functional forms instead of relying on constant and linear approximations (Holguin-Veras et al., 2011a; Gonzalez-Feliu & Sanchez-Diaz, 2019; Gonzalez-Feliu, 2024). We recall the definition of a functional form as the algebraic form of the function f(X;a) that explains a phenomenon (or explained variable) Y (in this case, the freight demand), where Y= f(X;a), X is an explanatory variable of a vector of explanatory variables. Most freight demand models rely on a single variable (which is either the Employment rates or the Area of the establishment (Sanchez-Díaz et al., 2016), and five main functional forms are in general used (Lagorio et al., 2018): constant, linear, logarithmic, potential and exponential. Therefore, the present work focuses on functional forms with a single explanatory variable, mainly on the five previously mentioned.
This work is positioned in the continuity of the recent developments of Gonzalez-Feliu (2024), which addressed various possibilities of aggregation and approximation of the four first functional forms using a single explanatory variable. That previous work presents two main limitations. The first is that the exponential function remained unstudied in terms of approximation capabilities, since it does not present a direct relationship to a quasi-arithmetic mean. The second is the need to go in-depth on the possibility of using those approximations (which are estimated with actual data for the explanatory variable) on standard data, which presents categorial values (so class centers) as happens with Employment and Area (Ambrosini et al., 2010).
As stated in seminal works on Freight demand modelling (Pendyala et al., 2000; Daganzo, 2005; Holguin-Veras et al., 2014; Havenga & Simpson, 2018) and more generally in statistical modelling (Kneib & Tutz, 2010; Dobson, 2013; Connelly et al., 2016; Vencálek et al., 2020), the choice of the suitable functional form, as well as the data granularity, impacts on the use conditions of the model. When data is lacking, two main approximations can be made: simplifying the functional form with constant or linear models (Holguin-Veras et al., 2011a) or using more suitable functional forms, but on approximations relying on arithmetic means of the explanatory variables (Gonzalez-Feliu, 2024). Therefore, it seems insightful to generalize it, both from a research and practice viewpoint, to determine how those approximations can be generalized to different types of variables, measurement issues and functional form logic.
This paper presents a set of considerations on the use of such approximations and completes recent developments with the inclusion of categorial data in the approximation analysis, plus proposes a polynomial approximation of the exponential function via a Taylor series development. First, a recall of the main functional forms of freight demand, from a generalized viewpoint, which includes both freight trip and commodity models, is made. After that, the extension of Gonzalez-Feliu’s (2024) analysis to categorial data is proposed and discussed. Then, an analysis of the exponential function and its possibilities of approximation via Taylor series is proposed and its accuracy is assessed. Finally, a generalization of the findings, methods and application issues is made to examine the implications of the proposed work for statistical modelling.
2. Considerations Related to Using Categorial Entities
as Explanatory Variables
2.1. The Main Functional Forms in FTG and FG
Freight intensity models relate respectively the number of trips (mainly weekly or daily) and the quantities generated by an establishment (in weight or volume) to a series of explanatory variables, the most representative being the employment and the area (Sanchez-Díaz et al., 2016). The first is called Freight Trip Generation (FTG, trip) and the second is Freight Generation (FG, commodity). Although different in nature and applications (Bastida & Holguin-Veras, 2009; Holguin-Veras et al., 2011a), both types of models see their accuracy strongly related to the identification of the most suitable functional form then to the granularity of the data used to calibrate them (Gonzalez-Feliu & Sanchez-Diaz, 2019). Although some models use more than one explanatory variable (Gonzalez-Feliu et al., 2020; Puente-Mejia et al., 2020), most works see stronger and more accurate models using a single explanatory variable, mainly the Employment, but area is sometimes used also.
To generalize those models (both FG and FTG) we can write them as follows. Given an establishment i or category a, the set of establishments of category a being noted as Va, the FTG or FG rates
can be estimated as a function of an exploratory variable of establishment i, x, as follows:
(1)
where
can be trips, commodity quantities in weight or commodity quantities in volume (either daily or weekly) and xi can be Employment or Area. As stated in Gonzalez-Feliu (2024), five functional forms can be defined at the level of the single establishment, summarized in Table 1.
The constant and linear forms have been widely studied and have many applications because of their additive properties. In a previous work (Gonzalez-Feliu, 2024). The synthetic forms for logarithmic and exponential forms are approximated by substituting the non-arithmetic mean with an arithmetic one. The accuracy of those approximations does not change strongly with respect to that of the exact formulations (the Mean Absolute Percentage Error, MAPE, increases by less than 1, which is negligible or contained as an increase, according to Gonzalez-Feliu, 2024).
Models presented above are defined and estimated for a quantitative variable. However, Employment is often presented mainly as a categorial variable, e.g. a
Table 1. Functional forms of FTG and FG.
Functional form |
Formulation at the level of a single establishment i |
Synthetic formulation for the zonal aggregation of the form |
Constant |
|
|
Linear |
|
|
Logarithmic |
|
|
Potential |
|
|
Exponential |
|
Not yet determined |
is the arithmetic mean,
the geometric mean and
the quasi-arithmetic mean of order ba.
range of employment (Gonzalez-Feliu & Sanchez-Diaz, 2019). Since models can be built using quantitative variables but then applied to categorial data for further forecasts, it seems suitable to examine the possibilities of applying the proposed approximations to categorial data instead of using a raw average as an approximation of the explicative variable.
Figure 1. General framework for collecting data, modelling and using the resulting models for freight intensity (demand) estimation (source: own elaboration).
Figure 1 shows the possibilities of application of the different models, included in a more general data production and modelling application framework. It presents three main phases, which are the following:
1) Field observation and data production, related to the characterization of the field that is studied, the definition of the model’s objectives, requirements and data needs and collection capabilities, then the collection or production of data for modelling purposes.
2) Model estimation, calibration and validation, i.e., the analysis of the possibilities of deploying different functional forms and modelling procedures, the construction and calibration of the model using field data and the different validation procedures once the model and its parameters are defined and calibrated.
3) Application of the model to estimate freight intensity rates on standard data of same nature in application fields with individuals of same natures and types as those considered in the model. Four possibilities are considered: having the individual data for a direct application of the model, applying the model to aggregated data using a constant or linear model instead of the non-linear one, applying the model to data using the arithmetic mean approximations or making evolution of this third case to apply the model to categorial data (so not having a single average per category of activity but defining, for each activity type, a set of Employment or Area classes and defining a mean for each class). The two first applications are those mainly used in previous literature, the third has first been proposed by Gonzalez-Feliu (2024) and the fourth remains unexplored.
The two applications related to approximating the values of explanatory variables using non-linear forms introduce a bias, partially examined (for the case where each category of activity is related to a unique arithmetic mean estimator) in Gonzalez-Feliu (2024), for four of the five functional forms (as said above, the exponential function remains unstudied in terms of approximations). However, this has not been made to a situation where, for each activity type, a set of subcategories (Employment or Area classes) is defined, each of them with an average value of the explanatory variable, which bias needs to be analyzed since it would be lower than the approximation by a unique category for each activity type (i.e. without distinction in terms of Employment or Area size). In this section, we will address this bias, analytically estimate the error introduced by this approximation, and examine the suitability of developing a model with quantitative data and then applying it to categorical data.
2.2. The Main Characteristics of Freight Demand
Before entering into detail on the approximations and considerations of FTG and FG models, it seems important to examine the context and the field of freight transport demand, mainly related to urban generators (Holguin-Veras et al., 2019). To do that, a descriptive statistics analysis of data from France is proposed. More precisely, an analysis of the Ile-de-France survey on urban goods transport is made below, focusing on the characteristics of freight demand for a set of economic sectors. This survey (Enquête Marchandises en ville Région Ile-de-France, ETMV-RIF) was carried out in the Ile-de-France Region between 2011 and 2013, and was released in 2014. For an overview of the methodology, see Ambrosini et al. (2010), for a results overview, see Toilier et al. (2016). This survey has been used in various works, mainly for descriptive analyses concerning the quantification of the conurbation’s flows (Dablanc, 2013; Toilier et al., 2016; Morin, 2020), but also for quantifying a specific demand sector (Koning & Conway, 2016; Béziat et al., 2017; Coulombel et al., 2018; Béziat, 2021).
The categorization of economic sectors for freight demand estimation has been addressed in various studies (Gonzalez-Feliu & Sanchez-Diaz, 2019; Pani & Sahu, 2019), most of them concluding that a contained set of mactro-categories can be enough to identify and describe this demand (Holguin-Veras et al., 2014; Alho & de Abreu e Silva, 2017; Gonzalez-Feliu & Sanchez-Diaz, 2019; Gonzalez-Feliu et al., 2024). Therefore, we select the 5 macro-categories proposed by Gonzalez-Feliu et al. (2024). For each category, a set of variables is analyzed to characterize the sector of activity. With the available data of the survey, the main variables are selected for the analysis:
Potential explained variables:
FTG rates (number of trips per week)
FG rates in weight (kg per week)
FG rates in volume (m3 per week)
Potential explanatory variables:
Employment (real variable)
Category of employment (categorial variable)
Area, in m2 (real variable)
Category of area (categorial variable)
FTG rates are directly available in the survey, whereas FG rates are associated with each single delivery, so a weekly estimation has been made by estimating, for each single delivery, its weight and volume, weighting them by its frequency and finally adjusting it using the corresponding representativeness coefficients. In Table 2, we present a synthesis of Employment, FTG, and Area variables.
Table 2. Main characteristics of each category of economic activities in terms of Employment, Area and FTG Rates.
|
|
Employment |
FTG (trips/week) |
Area (m3) |
Category of activity |
# obs. |
Mean |
Std error |
Mean |
Std error |
Mean |
Std error |
Agriculture |
12 |
6.33 |
9.76 |
3.64 |
4.31 |
1578.50 |
2650.44 |
Industry/Wholesaling |
199 |
87.36 |
243.28 |
14.11 |
51.23 |
2377.66 |
12,005.96 |
Retailing |
351 |
46.14 |
236.65 |
39.63 |
120.46 |
3231.33 |
18,902.56 |
Craftsmen/Tertiary/
Services |
581 |
36.24 |
124.17 |
15.85 |
39.24 |
1225.28 |
4348.28 |
Transport/Logistics |
45 |
46.84 |
74.37 |
221.95 |
367.72 |
48,209.51 |
245,783.82 |
In the selected dataset, only one category presents a low quantity of data (Agriculture), the other four having a sufficient number of observations to consider the hypothesis of Normality necessary for the application of most regression-based techniques and statistical methods used in freight demand analysis. Concerning employment, mean values vary from 6 to nearly 90 with very different standard errors. Industry/wholesaling is the sector with the highest employment (and highest standard error) whereas Transport/Logistics is that of both FTG rates and area. Agriculture, which has a small presence in urban areas, has the lowest means and standard errors for the three considered variables.
To illustrate the characterization of the establishments in categories of Employment and Area, Table 3 and Table 4 relate quantitative and categorial values of those two categorial variables (Employment for Table 3 and Area for Table 4), for one of the most representative macro-categories in urban zones: retailing and stores. Those tables present the categories for the explanatory variable (aggregated to have a statistical significance for means and standard errors, Gonzalez-Feliu & Sanchez-Diaz, 2019) for the activity sector of retailing and stores. It is important to note that the owners of the activities, who are not counted as employees, are considered to be the workforce. We observe that, in the considered dataset, the average number of employees does not correspond to the average value between the minimum and the maximum possible value of the range, being higher in 5 out of the 8 categories, whereas in three of them (10 to 19, 20 to 49 and 100 to 499).
Table 3. Characteristics of the retained categories of Employment regarding retailing and stores.
Category of Employment |
Number of
observations |
Average Workforce |
Standard error |
No employees (only owner) |
95 |
1.83 |
2.65 |
1 to 2 |
147 |
2.06 |
0.76 |
3 to 5 |
133 |
4.11 |
1.04 |
6 to 9 |
61 |
7.60 |
1.51 |
10 to 19 |
39 |
13.15 |
2.32 |
20 to 49 |
43 |
29.58 |
8.24 |
50 to 99 |
14 |
79.57 |
16.00 |
100 to 499 |
35 |
184.20 |
90.34 |
Table 4. Characteristics of the retained categories of Employment regarding retailing and stores.
Category of Area |
Number of observations |
Average Area |
Standard error |
Less than 300 m2 |
224 |
102.22 |
70.47 |
300 to 400 m2 |
14 |
356.58 |
31.49 |
400 to 2500 m2 |
39 |
1113.80 |
923.65 |
More than 2500 m2 |
61 |
9318.96 |
15,359.07 |
they are higher. Activities with more than 500 employees have not been considered here because their number of observations was lower than 6 (Gonzalez-Feliu & Sanchez-Diaz, 2019).
Concerning area, we observe that standard errors are lower than averages in 3 out of the four categories and higher for establishments with a surface higher than 2500 m2. As for Employment, the mean values of the category do not correspond to the average between the maximum and the minimum value of the range.
2.3. Considerations on the Use of Categorial Explanatory Variables in FG and FTG for Constant and Linear Functional Forms
When dealing with categorial variables, each category defining a class center or average value (which is not necessarily, as seen above, the average between maximum and minimum values of the category), FTG and FG models can be written as follows:
(2)
where z is the considered zone, s each category of employment and
the set of establishments of a zone z belonging to category s. If we want to define those functions for each functional form, the constant functional form can be written as follows:
(3)
This relationship can be improved by proposing different generation rates by category of employment (Le Nir & Routhier, 1995; Ambrosini et al., 2008; Gonzalez-Feliu et al., 2014). We can rewrite the relationship as follows:
(4)
where
is the constant generation rate value for subcategory of employment s of category of activity a. The linear functional form, because of its transitive properties, can be directly related to the total employment (or the average employment) of the category, as follows:
(5)
where
is the total employment (or area) of the establishments of category a and subcategory s within zone z. We observe that in both cases, no approximations are needed since the transitivity properties of the relationships allow a decomposition which ensures a total equivalence of formulations.
2.4. Considerations on the Use of Categorial Explanatory Variables in FG and FTG for Logarithmic and Potential Functional Forms
The other functional forms do not have a direct transitive property, but an approximation in which error is acceptable can be defined to relate them to total values of the explanatory variable (Gonzalez-Feliu, 2024). Thus, the logarithmic relationship is written as follows:
(6)
Approximating the average mean, we can write:
(7)
where
is the arithmetic mean of the variable x for establishments subcategory s of category a within zone z. Finally, for the potential functional form, the relationship can be written as follows:
(8)
where
.
Then, we can discuss what changes in terms of error, if we use the second approximation (the categorial consideration) with respect to the first one (using overall employment), that last one being analyzed in-depth in Gonzalez-Feliu (2024). As shown by the previous work, the largest MAPE possible added by those approximations is 1, which does not change when considering the categorial organization of data. Indeed, for each single establishment, APE is lower or equal to 1 (Gonzalez-Feliu, 2024), but with the categorial configuration, the dispersion being contained can even decrease. This ensures a MAPE between the use of the exponential function and its Taylor approximation lower than 1:
(9)
After that, we need to address the use of the arithmetic mean instead of the quasi-arithmetic mean of order i in the resulting polynomial relationship. To do that, we estimate for each single establishment i the percentage error
as follows:
(10)
(11)
When calculating the limit of the MAPE function for higher and lower values of the explanatory variable (as in Gonzalez-Feliu, 2024), and assuming that the explanatory variable follows a normal distribution having a mean
and standard error
, and a target of covering 95% of the statistical distribution,
will be, in 99.7% of the cases, lower than
(Wonnacott & Wonnacott, 2015). When calculating the limit of the function as on Gonzalez-Feliu (2024), for high dispersions (i.e., high standard errors), we observe that this limit is mainly conditioned by the dominating term of the Taylor development (i.e., the term that gives the value the closest to the exponential function of all terms of the series). In this case, we can write the following equivalence:
(12)
where k is the position of the dominating term of the Taylor development. And, following the results and demonstrations of Gonzalez-Feliu (2024), we can state that this limit is lower or equal to 1, which ensures a MAPE is also lower or equal to 1. Therefore, we can state that the polynomial approximations of the exponential function result in MAPE estimates, which are in line with the other functional form approximations, completing previous works with the exponential function, not yet studied in terms of approximation capabilities.
Those developments show that applying such approximations with categorial data, i.e., proposing models with the actual values of explanatory variables but then applying them to data having only average values per category, leads to an estimation with the same orders of magnitude for its accuracy than having actual and disaggregated data, so those approximations are suitable for both research and practical applications.
3. Considerations on the Relevance of Using Approximations of the Exponential Functional Form
In the previous work (Gonzalez-Feliu, 2024), the exponential functional form was not addressed since it is less common in FTG. However, in FG it is more common, and one of the most representative functional forms (Gonzalez-Feliu et al., 2024). The main issue when deploying mean-based approximations on non-linear functional forms is the existence of a non-arithmetic mean that can be used as an equivalent (or an approximation) of the functional form. This is the case for geometric mean in logarithmic functional forms and quasi-arithmetic mean of p-order for potential functional forms. However, the exponential functional form does not present a direct equivalence relating it to a non-arithmetic mean, so other ways of approximating it are needed. One of the most popular approaches to finding polynomial equivalences of non-polynomial functions is that of using approximation expansions, like Taylor series, Pusieux Series or Lagrangian approximations, among others (Abramowitz & Stegun, 2006).
In this section, we present an approximation to the exponential functional form via the Taylor series development, which allows us to transform it into a polynomial function which can be related to quasi-arithmetic means of p-orders. First, we present the Taylor development approximations and the main considerations of its use. Second, we address the order of magnitude of the mean absolute percentage error of this approximation, as an extension of the limits presented in Gonzalez-Feliu (2024) to the Taylor development of the exponential function. Third, we address the use of categorial employment for the exponential function.
First, let us consider the exponential functional form. This relationship can be written as follows:
(13)
There is no direct transformation, using known mean values, that allows defining a transitive relationship to decompose the sum of terms and aggregate it. However, as the main function is an exponential one, it can be transformed, using a Taylor series development, into a polynomial function (Agmon, 1951), as is already used in various analyses and approximations related to rural and regional economics (Tiwari et al., 2015) and to econometric relationships close to those examined here in nature (Jin & Wang, 2017; Abraham, 2018). The exponential functional
can then be approximated as follows:
(14)
For practical uses, we need to define the size of the polynomial function, i.e. the value of the maximum index i that is retained. To do that, we assume a targeted percent error lower than 1%, which is a very small error which will lead to contained MAPE and RMSE values (Wonnacott & Wonnacott, 2015). The percent error is chosen since it is the most common way to evaluate the accuracy of a Taylor series, for which its mathematical relationship is well-known and studied. For the exponential function
, the percentage error of the Taylor approximation takes the following form:
, so
(15)
This inequality cannot be solved analytically without knowing the values of a and x, so a numerical approach is needed. To find a suitable number of terms for the Taylor approximation (and then the order of the polynomial approximation), we estimate, via iteration, the approximation error of the proposed Taylor series for each value of i until an error lower than this threshold is reached, and this for a set of possible values of a and x. Knowing the most common values of a in current works, which are mainly comprised between 0.5 and 2 (Holguin-Veras et al., 2011b, 2011c; Krisztin, 2017; Gonzalez-Feliu et al., 2024), and knowing that Employment is mainly comprised between 0 and 50, we estimate the length of the polynomial Taylor transformation for a = {0.5, 1; 1.5; 2} and x = {1, 5, 10, 30, 50=}. Those results are presented in Table 5.
Table 5. The number of non-constant terms of the Taylor approximation to reach a percentage error lower than 1%.
x\a |
0.5 |
1 |
1.5 |
2 |
1 |
4 |
5 |
6 |
7 |
5 |
7 |
11 |
15 |
18 |
10 |
11 |
18 |
24 |
30 |
30 |
24 |
42 |
58 |
75 |
50 |
36 |
64 |
91 |
117 |
We observe that the length of the polynomial transformation increases with the increase of x and a. Only for small values of x (less than 10 and for a up to 2) the lengths remain contained and computable even with non-specialized tools (for example, Excel or similar general software tools). Therefore, the Taylor approximation seems to be applicable only for small values of x and contain values of a, which is the case of retailing (Ambrosini et al., 2010; Gonzalez-Feliu et al., 2024), the category of activities the most studied and one of the main in freight trip and freight generation (Holguin-Veras et al., 2011c).
Second, we can estimate the error of the proposed approximation. To do that, let us consider the exponential functional form
for an establishment i belonging to an activity category a in a zone z, and its approximation using the function proposed in Equation (11), noted as
. If we aim to calculate the MAPE of this estimation with respect to the formal functional form, we obtain the following development:
(16)
Since the approximation has been chosen for having an error of less than 0.01, the total error individual error (APE) will be lower than 0.01, which means that the total MAPE is lower than 0.01:
(17)
Knowing that a potential approximation using average values of the explanatory variable, and knowing the relationship in limits of polynomial functions and its behaviour, the estimation of MAPE when approximation of the value of x by an arithmetic mean is of the same nature as on the analysis for potential functions presented in Gonzalez-Feliu (2024). So, we can conclude that MAPE of a suitable Taylor approximation of an exponential function will lead to an error with a similar order of magnitude to the approximation of a potential function, which stays, according to Gonzalez-Feliu (2024), inside the accuracy range of the exact functional form.
Finally, we can address the main considerations of using the exponential function with categorial data for the explanatory variables. The application of Taylor approximations, although long, is similar to that of potential functions, with the difference that the potential functional form is a polynomial function of size 1 and the Taylor approximation has a higher size. But the considerations that apply to the potential function can be applied to the Taylor approximation (see Section 2 for those considerations). So, we can conclude that the MAPE of applying a Taylor approximation is of the same nature as that of a potential function, which, according to Gonzalez-Feliu (2024), leads to an increase of MAPE of 1, which does not change the order of magnitude of the error
4. Application and Generalization of the Proposed Considerations of Statistical Modelling
After presenting the various considerations and showing their mathematical soundness and approximation capabilities, we discuss the potential applications, limits and generalization issues of the whole set of equivalences and approximations that compose the two related works (Gonzalez-Feliu, 2024 and the present one).
Considering categorial use of the proposed models, as shown in Gonzalez-Feliu and Sanchez-Diaz (2019), when using categorial data, relying on constant generation leads to a strong degradation of the approximation capabilities of a model. The proposed approximations lead to rely, for each category, on the category’s arithmetic mean (of employment and area) as an explanatory variable and not on an average generation, proposing models whose errors (in MAPE) remain of the same order of magnitude.
The Taylor approximation is here applied to exponential relationships since in monovarietal FTG and FG modelling, neither complex polynomial (i.e., with more than one constant plus one variable terms) and mixed-form (i.e., mixing polynomial, potential or exponential forms, including power of rational relationships) are observed (Sanchez-Díaz et al., 2016). However, when dealing with probabilistic generation (Gonzalez-Feliu et al., 2014; Lagorio et al., 2018), non-Normal distributions can be defined (as Rayleigh or Gamma, which are pseudo-Normal, or Burr, Exponential and Weibull distributions). Those three can be related to rational, exponential or logarithmic forms, for which the Taylor series can give valid approximations. The deployment and validation of those approximations should then follow the same methodological patterns as that presented in Section 3.
Although the Taylor approximation seems accurate and able to contain the errors of approximation, its implementation needs the deployment of long polynomial functions (with lengths of 5 to 20 terms), which needs the use of software tools (statistical software, Excel, Python, etc.) to implement them. This means that those approximations cannot be used by non-experts in practice (a minimum notion of mathematical economics and the needed software tools is required) but its deployment remains possible. In any case, exponential functional forms remain less used than the others, and would need further analyses mainly related to parameter definition and validation (Holguin-Veras et al., 2011b).
Those results and methods lead us to two main generalization issues. The first is the need to define an extended, general validity-checking methodology in statistical modelling. Although validation and sensitivity analysis have reached a standard level of maturity and are widely used and promoted in statistical modelling (Ljung & Guo, 1997); Donnelly et al., 2012; Gonzalez-Feliu & Routhier, 2012; Wonnacott & Wonnacott, 2015), the idea of the proposal below is to address a framework to examine the application, validation and transferability capabilities of a model, and set the applicability conditions to relate it to the reality it represents (Ackoff, 2001). If we intend as a model a representation of a given reality as seen by the person that conceives the model, in an experimentalist (or social pragmatist) paradigm (Churchman, 1968; Jackson, 1982; Britton & McCallion, 1994; Ulrich, 2012), it is important to relate this model to the reality it represents to ensure the representation is suitable, in both ontological and empirical perspectives (Feliu, 2003) and matches the ends of both users, commanders and model-builders needing it (Ackoff, 2001; Gonzalez-Feliu & Gatica, 2022). This leads to the following framework, composed of 6 elements (Figure 2).
![]()
Figure 2. Framework for analyzing the validity conditions of a model (own elaboration).
The 6 elements can be defined as follows. Coherence or consistency is the ability to possess a structure in which all terms satisfy the axioms of the theory. In other words, a model is coherent if it has a structure such that all the axioms of the theory are true in this structure and, therefore, inherent in the internal structure of the model (Bonnafous, 1990). Relevance can be defined as the ability to represent the intended reality for the expected objectives. It is inherent to the objectives of the model-builder and deals with the representatives of the modeled reality with respect to the relative Truth (Feliu, 2003), which can only be addressed in a cyclic and iterative way (Ackoff, 1977; Gonzalez-Feliu, 2019). Both consistency and relevance are complementary and become the basis of a model’s validity: the first related the model to its theoretical background (de iure reality), the second to the observed reality and its capability to represent the Truth (de facto reality), as on the basis of experimentalism epistemological position (Britton & McCallion, 1994).
A third element of model validity is that of accuracy, seen as the ability to reproduce the observed reality (and not the inherent reality, or Truth) in terms of error with respect to measured data. It is measurable and quantifiable (Bonnafous et al., 2013) and various methods and indicators can be defined. The main difference between relevance and accuracy is that relevance sets the ability of the person that builds the model to represent the inherent reality, to define the “right” problem, and accuracy is the capability of a model to be close to that representation of the problem. In other words, and using the terms popularized by Ackoff (1977), relevance stays on solving the right problem and accuracy in choosing the most appropriate solution method for an already defined problem.
Robustness is seen here as the solidity and invariability of the methods used in their ability to obtain results. In other words, given a reality, its representation, the theoretical foundations of the proposed model and its accuracy in reproducing the data collected to develop it, the robustness measures the capability of a model to maintain this triple level of representativeness (consistency, relevance and accuracy) when used on different data of same nature (Crainic & Semet, 2014). Robustness is measurable by statistical indicators/tests, mainly by validation procedures (defining a construction data set, used to develop the model, and a validation set, and using error or sensitivity analysis on data not used to construct the model, or applying the model to new data sets and measuring its error). Therefore, those four elements constitute the basis of model validation, and are often seen in Physical sciences (Carlesi et al., 2016).
To that core, it is important to add two more elements related to the possibilities of using the model outside the context for which it was initially defined. Flexibility is the ability of the model to anticipate situations (inputs) different from those planned during its construction, for the same application (without changing context). Adaptability or adaptiveness is the ability of the model to be applied to new contexts. Flexibility examines the capability of a model to add new inputs (variables) or change the nature of the existing variables without questioning its coherence, relevance, accuracy and robustness (Erdoğan, 2017). Adaptability is the easiness to extend the model to deal with other contexts without needing to build an entirely new model, thus reducing the data production phase, which is, in general, very costly (Holguin-Veras & Jaller, 2014).
This validity checking needs to be part of a more complex and systemic process, that of modelling, which can no longer be seen as linear but as a circular, iterative (and interactive) framework. We propose to formalize this modelling framework, starting with Ackoff et al.’s (2007) principles of problem solving (going from the problem to the solution) and solution problem (going from the solution to the problem), and as an extension of the problem dealing cycle re-defined in Gonzalez-Feliu (2019) and completed, for problem solving (but not for statistical modelling) in Gonzalez-Feliu and Gatica (2022). It can be schematized as follows (Figure 3).
Figure 3. Iterative methodology for a reality-representing modelling framework (own elaboration, built from thinking and considerations from Ackoff, 1977; Gonzalez-Feliu, 2019; Gonzalez-Feliu & Gatica, 2022).
This cycle has three phases, which can be recalled iteratively. The first is that of the problem definition/instruction, where the reality is observed, then using data (collected or generated by various production techniques) and with the need to formulate hypotheses, a first model is defined. The suitability of the model (even before formalizing it) can be assessed interactively with the field used to observe the reality. In the second phase, the model is formalized and a first solution (operational model in terms of forecasting) is proposed and tested (verifying its consistency, accuracy and robustness). If those three elements are considered as satisfactory, a confrontation with the observed reality (in terms of relevance, flexibility and adaptiveness) is made, often iteratively, to examine if the solution represents the right problem, if the right problem represents the right reality and if the final outcome (operational model) is able to be used to deal with the needs of the field for which it has been constructed. This means that, if one or more of the six elements are not satisfactory, it is necessary to adjust, adapt, review or even re-build the model, and sometimes change the representation of the observed reality. In those cases, a partial revision of phases 1 and 2 should be sometimes required. The modelling cycle stops once all six elements (consistency, relevance, accuracy, robustness, flexibility and adaptiveness) are considered satisfactory.
5. Conclusion
This paper proposed two sets of analyses regarding the approximations of functional forms in freight demand generation to use non-linear functional forms with aggregated data. The first included a set of considerations on the use of Gonzalez-Feliu’s (2024) approximations with categorial data is proposed. The second is a proposal for the polynomial transformation of the exponential function via Taylor series and its possibilities for approximation. The errors of estimation when replacing the actual values by arithmetic means (the only available information when dealing with categorial or aggregated data) are analyzed in both cases, and this work shows that they remain contained, with the same orders of magnitude that in Gonzalez-Feliu (2024).
Those results show interest in the proposed analysis from both theoretical and practical viewpoints. From a theoretical viewpoint, the determination of approximations via mathematical deduction justifies the proposed estimators and their internal coherence, and then the error estimation analysis allows for their validation. From a practical viewpoint, those estimators can be used when aggregated data at a zonal level is available but not individual data, giving the possibility of accounting for non-linear behaviours when they can be proven even if non-disaggregated data is available. The work also confirms and completes Gonzalez-Feliu and Sanchez-Díaz’s (2019) conclusions showing that the choice of the most suitable functional form allows, in all cases, a more accurate estimation of FTG, even when disaggregated data is non-available, confirming that constant and linear estimations are not suitable for representing non-linear behaviours and a mean-based approximation remains more accurate than a linearization of the phenomena.