A Model for the Mass-Growth of Wild-Caught Fish

The paper searched for raw data about wild-caught fish, where a sigmoidal growth function described the mass growth significantly better than non-sigmoidal functions. Specifically, von Bertalanffy’s sigmoidal growth function (metabolic exponent-pair a = 2/3, b = 1) was compared with unbounded linear growth and with bounded exponential growth using the Akaike information criterion. Thereby the maximum likelihood fits were compared, assuming a lognormal distribution of mass (i.e. a higher variance for heavier animals). Starting from 70+ size-at-age data, the paper focused on 15 data coming from large datasets. Of them, six data with 400 20,000 data-points were suitable for sigmoidal growth modeling. For these, a custom-made optimization tool identified the best fitting growth function from the general von Bertalanffy-Pütter class of models. This class generalizes the well-known models of Verhulst (logistic growth), Gompertz and von Bertalanffy. Whereas the best-fitting models varied widely, their exponent-pairs displayed a remarkable pattern, as their difference was close to 1/3 (example: von Bertalanffy exponent-pair). This defined a new class of models, for which the paper provided a biological motivation that relates growth to food consumption.


Introduction 1.Growth Models
Size-at-age (length or mass) is an important metric about animals and so there is a large body of literature about which growth models fit best to given size-at-age data.The von Bertalanffy [1] and Pütter [2] differential Equation (1) provides a comprehensive framework for the most common models for mass-growth: It describes body mass m(t) > 0 as a function of age t, using the following five model parameters: The exponent-pair a < b ("metabolic scaling exponents") and the constants p and q are assumed to be non-negative; m 0 > 0 is an initial value, i.e. m(0) = m 0 .In the case of equal exponents, Equation ( 1) is replaced by the generalized Gompertz [3] differential Equation (2), the limiting case of Equation (1), when b approaches a: This equation uses four model parameters: a, p, q (non-negative) and m 0 > 0.
In general, the solutions of ( 1) and ( 2) are expressed in terms of non-elementary functions, namely hypergeometric functions [4] and exponential integrals [5], respectively.The present paper used numerical solutions [6] [7].For several "named models" these equations could be solved by elementary means, namely for the exponent-pairs a = 2/3 and b = 1 of von Bertalanffy [1], a = 3/4, b = 1 of West [8], a = 1, b = 2 for logistic growth of Verhulst [9], and a = b = 1 of Gompertz [3].There are also elementary solutions for Richards' [10] model (a = 1 and b > 1 is a free parameter) and for the generalized von Bertalanffy model (b = 1 and 0 ≤ a < 1 is a free parameter).The von Bertalanffy growth function for length (VBGF) fits to the present framework, too: It is a special case of Equation (1) using the exponent pair a = 0, b = 1 (bounded exponential growth).Other special cases are power-laws between mass and time (q = 0, p > 0), in particular linear growth (a = b = q = 0, p > 0, m 0 > 0).
Figure 1 plots the exponent-pairs for these named models.Several of these models were motivated by competing biological theories about the relation between growth and metabolism; whereby different authors proposed different exponent-pairs [11].However, these models "have only a loose connection to the biology behind the actual growth processes" [12].Therefore, no single growth model may be exactly correct for all species [13].For fish, in particular, the optimal parameters of the growth models may depend on environmental factors [14] [15].Where there is no information about other biological factors for growth, the models may nevertheless be used to extract relevant information from the data, as is current practice fisheries management [16].

Shape of the Growth Curves
The models differ in the assumed exponents and in the number of free parameters (these are optimized to obtain the best fit to given data): For the "named models" with given exponent-pairs the parameters p, q and m 0 are free; for the more general model classes in addition one or two exponents are free.The typical Open Journal of Modelling and Simulation The parameters a, b, p, q come from Equation (1) and Equation (2), respectively.For exceptional exponents and parameters, the growth curves may be non-sigmoidal (a = 0) or unbounded (p > 0, q = 0).For instance, the most common growth model in fisheries literature, VBGF for length (Google search for "VBGF, fish": ca.15,000 results), is not sigmoidal.However, it is equivalent to a sigmoidal model for mass-growth in the following sense [1]: If length-growth follows VBGF, and if mass is assumed to be proportional to a power L k of length L (k > 1), then mass-growth is described by the exponent-pair a = 1 − 1/k > 0, b = 1 (sigmoidal growth curve).
Summarizing, there seems to be a consensus in literature that the mass-growth of fish is bounded and sigmoidal.However, catch data for fish seem not to support this statement.The authors [17] investigated a set of 60 data for different species (37 for fish) and fitted Equation (1) with the exponent b = 1 (generalized von Bertalanffy model).They found that for 17 of 37 fish-data, but only for one non-fish species, any exponent a could be used to model mass-growth without affecting the fit to the data significantly (when the other free parameters p, q, m 0 were optimized).In particular, a fit by a non-sigmoidal growth curve (a = 0) was acceptable.Conversely, for 18 of 20 data, where a = 0 was acceptable, any other Open Journal of Modelling and Simulation exponent was acceptable, too.As the authors noted, such a high variability has implications for data-fitting, as standard search strategies may not always find the right direction towards the optimum parameters, if there is no significant difference in the fits.Indeed, fisheries management literature reported problems with convergence [18].The authors attributed the high variability of the exponents to differences in data quality, as most non-fish data were average values from controlled studies, where measurements were conducted repeatedly for the same group of animals, while most fish-data were about wild-caught fish, where each fish contributed only once in its lifetime to the measurements.Such data were also affected by gear-selectivity, where there may be few data about young (small) fish, as these could escape from the trawler nets, or as anglers preferred to harvest large fish, and few data about older fish, as most of them did not survive long enough to come close to their maximum size, resulting in many different possible shapes for the catch-curve that describes the age structure of the collected data [19].As a consequence, several data supported unbounded growth [20]; [21] reported this for about 1/3 of their fish data.This was considered as an indication for too few data about older animals.By similar considerations, where data supported bounded non-sigmoidal growth this was considered as an indication for too few data about young fish.

Problem of the Paper
The conclusion about the high variability for fish-data was drawn from data-fitting to average weights.The present paper therefore reconsidered this issue and asked, if data-fitting to raw data would change the picture.More specifically, the paper aimed at identifying data that could be fitted well by a bounded sigmoidal mass-growth curve and where this fit was clearly superior to the fit by typically non-sigmoidal model functions.These data were deemed insofar as suitable for the application of sigmoidal growth models, as the variability of the exponents was more restricted.
To achieve this goal, the paper compared the fit of three typical models: Linear growth, as it represents unbounded models, bounded exponential growth, as it represents bounded non-sigmoidal models, and the von Bertalanffy exponent-pair, as it represents sigmoidal growth.This comparison was applied to various data for wild-caught fish, whereby the model curves were fitted to the raw data from [22].Further, data-fitting took into account the typically heteroscedastic distributions of size (meaning a larger variance for a higher mass).The paper tested, if there was an acceptable fit by the sigmoidal model, but no acceptable fit by the other two models.The paper then explored, based on this test, how many of the considered catch data were suitable for the application of sigmoidal growth functions.For these data it determined the best fitting sigmoidal model amongst models described by Equation (1) and Equation ( 2) and searched for a pattern of the optimal exponent-pairs.The term "acceptable fit" needs a technical definition, explained in the Me-Open Journal of Modelling and Simulation thods.For instance, a visual inspection alone did not suffice to recognize that the fit of the bounded exponential model was acceptable for Figure 2, but not acceptable for Figure 3, when compared to the von Bertalanffy exponent-pair model.For, the plot did not reveal, for how many data-points a good/poor fit was achieved.

Materials
The authors conducted a literature search for growth data of wild-caught fish, referring to the following secondary sources: 39 data-sets came from a repository by Derek Ogle [22] listed under "growth" (Table 1 for the file names).The conversion of length-at-age data into mass-at-age was based on Fish Base [23].All data were processed in Mathematica 11.3 of Wolfram Research  ; Appendix explains the used code.As it is generally accepted in fisheries research that model comparison is a data-hungry exercise, only data for population-level studies with initially at least 500 data-points (different fish) were considered.Further, data were removed if after the elimination of incomplete data-points (e.g.size at an unknown age) and filtering (e.g. by sex) there remained fewer than about 100 data-points or less than six points of time (The former condition was not strict, but the latter was, as the paper compared models with up to 6 free parameters).

Data Fitting
Literature in fisheries management prefers to fit model curves directly to the raw data, assuming a heteroscedastic distribution of size.Most common is the assumption of a lognormal distribution [24], as by Equation ( 4) for a lognormal distribution the standard deviation stdev of size is proportional to the expected   #n is the numbering of the data that were retrieved from the selected dataset.
value ev of size with the shape parameter as (approximate) constant of proportionality: Here, m is the location parameter and s > 0 is the shape parameter of a lognormal distribution and the rightmost term in Equation ( 4) assumes s ≈ 0 (whereby O(•) refers to the Taylor series remainder).The present paper used this approach in order to conform to established practice in fisheries management literature (A different approach assumes a normal distribution for each age class, but an age dependent variance; e.g.[25]).
The hypothesis of lognormally distributed data was not always exactly correct: For the data of Figure 3 the Cramér-von Mises distribution fit test [26] identified significant deviations from a normal distribution.This was due to the large number of data points.However, for practical purposes (convenience for data-fitting) the hypothesis of a lognormal distribution could be retained, as a probability plot (Figure 4) confirmed that the lognormal distribution was a reasonable approximation to the true distribution (closeness to the diagonal).Further, the paper avoided to draw conclusions that would essentially depend on this distribution assumption.Open Journal of Modelling and Simulation

Optimization
There are several approaches for heteroscedastic data fitting, amongst them weighted least-squares for the mass-at-age data, using the reciprocals of the standard deviations (of mass at each age class) as weights.The present paper uses instead transformed data.For, technically data-fitting to lognormally distributed mass-at-age data (i.e.maximum likelihood estimation of the optimal distribution and growth function parameters) is equivalent to using a logarithmic transformation of mass and the method of least squares to fit the transformed growth function u(t) = ln(m(t)) to these transformed data (Thereby also the parameters for linear growth, m = m 0 + p⋅t, were determined from a nonlinear regression for u).
Further, (for the transformed data) the least squares method is equivalent to the minimization of the lack-of-fit sum of squares LFSS, where LFSS is the weighted sum of the deviations of the model curve from the averages at each age (the weight is the count of data at this age).The present paper used this reformulation of the least squares method to speed up computations (It has been used earlier in fisheries management, e.g.[27]).The use of LFSS has the following justification: The method of least squares assumes that the errors (deviations from the model curve) are random, meaning independent (normally distributed) random variables with expected value 0. Therefore, the sum of the squared errors, SSE, may be decomposed into SSE = LFSS + PESS, where PESS, the sum of squares of pure errors, is the sum of the squared deviations of the data at each age from the averages at each age.Curve fitting can only minimize LFSS, while PESS remains unchanged and can be ignored for the purpose of data-fitting.Open Journal of Modelling and Simulation Summarizing, data fitting was reduced to minimizing LFSS, a weighted sum of the squared differences between the average of the logarithmic weights at this age and the logarithm of the growth curve at this age.
The data fit to the three test models (linear, bounded exponential, and von Bertalanffy exponent-pair) used the Mathematica function NMinimize and simulated annealing [28] as optimization method.Further, the parameters of the bounded models were restricted so as to ensure an "empirically meaningful" asymptotic mass, defined from a comparison of the asymptotic mass m max of Equation ( 3) with the maximal (arithmetic) mean value of the mass at different ages.If m max was in the interval between half and twice that maximal average mass, as could be observed e.g. for the model curves in Figure 3, then the corresponding model curve was accepted as empirically meaningful.This constraint helped to avoid that optimization was trapped at clearly false growth functions (e.g.constant functions).Note that the paper did not simplify optimization by using literature values for the initial condition m 0 (e.g.natal mass) or for the asymptotic mass m max .
For the subsequent optimization of the general models ( 1) and ( 2), the paper used a custom-made variant of simulated annealing to minimize LFSS, but without constraints.This approach is explained below for the search of the parameters for the general von Bertalanffy-Pütter model (1).The following computations were repeated in a loop of 500,000 steps.Using starting values for the parameters a, b, m 0 , p, q (starting from the optimal parameters of the von Bertalanffy exponent-pair or, Equation (2), from random numbers), it multiplied them by random numbers between 1 ± dist, but close to 1 (e.g.dist = 0.01); this deviated from the usual simulated annealing, where small random numbers are added.If the altered parameters improved LFSS, they were accepted.Otherwise, the increase of ln(LFSS/N) was compared with an exponentially distributed random number (e.g.mean value exm = 1; N = number of data points); this function was motivated by Akaike's [29] index explained below.If the increase was below this random number, the altered parameters were accepted.Otherwise, the previous parameters were retained.The parameters dist and exm were set by the authors so as to obtain a reasonable convergence (several experments).
After each 10,000 steps, dist and exm were reset: They were reduced by 10% in order to avoid moving too fast too far away from a good candidate for the optimum and the computations were restarted with the best hitherto found parameters.For several data the loop was repeated.In case that the optimal exponents were close to the diagonal (distance below 0.1), the optimization was repeated for the generalized Gompertz Equation ( 2).For the present data this did not occur.For three data the custom-made optimization approach was needed also for the von Bertalanffy exponent-pair, as the general purpose procedures produced a stack-overflow.

Model Selection
For the definition of an acceptable fit, the paper defined a measure of fit that was motivated by the Akaike information criterion [29], namely the following pseudo-Akaike-weight pprob defined from a pseudo-Akaike index PAIC.In terms of this index, the most acceptable model has the least PAIC, defined below.The best fitting model (least LFSS) needs not be most acceptable (penalty for additional parameters).
The relation of formula (4) to the usual AIC is explained below.PAIC was defined from the lack-of-fit sum of squares for the model, LFSS (model), from the number N of data-points, from the number AC of age classes and from the number K of optimized parameters.Thereby, K = 4 for the test models (except for linear growth: K = 3), as m 0 , p, q and implicitly the shape parameter s of the lognormal distribution were optimized.Further, K = 5 and K = 6 for the generalized Gompertz model ( 2) and the von Bertalanffy-Pütter model ( 2), respectively.
The index pprob compares a given model with the most acceptable one.
( ) Here, ∆ = PAIC (model) -PAIC (most acceptable model) > 0. The paper defined a fit as acceptable, if pprob ≥ 2.5%.As the maximal value is pprob = 50%, inacceptable fits were linked to the lowest 5% of possible values of pprob.For example, in Figure 3 and Figure 2 the von Bertalanffy exponent-pair model had the best fit amongst the three compared models.For Figure 3, but not for Figure 2, the fit was insofar significantly better, as none (respectively both) of the non-sigmoidal models had an acceptable fit.Further, for both figures the best fitting model (amongst those considered) did not reduce LFSS enough to minimize also pprob.Thus, its predictive power was highest amongst the considered models, but the loss in simplicity may not have been justified by the gain in predictive power.
The following outline explains, why the authors decided to use pprob as a measure of the goodness of fit.PAIC is a modification of the Akaike index AIC c for small samples [30] [31]; for variants c.f. [32].
; in this formula the authors replaced N by AC and SSE by LFSS⋅AC/N.Thus, PAIC assessed the fit of the models in terms of their fit to the averages (of the logarithmically transformed data) at age, weighted by the percentage of data represented by this age.Therefore, pprob was the Akaike weight relative to this fit to averages.
Clearly, PAIC penalized complex models more, than AIC c did, because for the averages there were only few degrees of freedom.However, in the context of fitting growth models the degrees of freedom attainable for large sample sizes may be illusory, if data come from a few age classes.For instance, data for thousands of fish, but all at age 0, do not confer any information about growth.PAIC has the advantage that with a high number of data points a slightly better fitting  average value, although these values represented samples of different sizes.Some sources [33] supplemented such data with information about the count of data-points for each age class.As was explained for optimization, data-fitting using geometric means plus this information would result in the same outcome as data Open Journal of Modelling and Simulation fitting using the raw data.However, as averages (arithmetic means) may differ significantly from geometric means (Figure 6), data of the form average size and count at age were not considered, either.
Table 3 and Figure 1 summarize the results of data fitting.All data were plotted to check their plausibility (For one dataset, paste & copy changed age 10 to age 1 generating a U-shaped growth curve; this was then corrected).All data had between 30% -90% duplicates (Table 2).For two of the 15 selected data, the bounded linear growth had the best fit (least LFSS); for eight it had an acceptable fit (pprob ≥ 2.5%).Further, for eight data linear growth had an acceptable fit.
Thus, for six (ca.2/3) of data the fit by the representative for sigmoidal growth was not significantly more acceptable than the fit by the (simpler) non-sigmoidal models, whence it was considered to be futile to seek the best fitting sigmoidal model.Data quality appeared to matter insofar, as the percentage of data that were suitable for fitting sigmoidal growth models was higher for data with 16 or more ages (c.f.Table 2).This was insofar plausible, as many species of fish continue to grow over many years.Further, for the 15 selected data there was a significantly positive correlation coefficient (0.69) between the number of data-points and the number of ages (t-test at 99.5% significance), which was plausible, too.
Six data were identified as suitable for sigmoidal growth.For these, the optimal exponent-pairs (fit of the general von Bertalanffy-Pütter model) were computed (Table 4). .In comparison, the exponent-pairs seemed to be close to the generalized von Bertalanffy models (b = 1 and a < 1), but for the two species (Bass) the exponent a > 1 was incompatible with this model class.Also the distances between Figure 6.Mass-at-age data #3 (blue) about Freshwater Drum from Lake Erie, arithmetic mean (red) and geometric mean (green), using a conversion of length-at-age data.The "new class" refers to the Parks-1/3 model explained in the Discussion section.
the optimal and the von Bertalanffy exponent-pair were notable.This indicates that the exponent-pairs with a reasonable fit to the data (LFSS close to the mi-Open Journal of Modelling and Simulation nimal LFSS) may extend over a wide region; the authors [34] observed this also in a different context (least squares approximation to average size at age data).
Further, Figure 1 displayed a pattern for the approximate location of the optimal exponent-pairs.They were close to the dashed line b = a + 1/3, even for large values of the exponents.Thereby 1/3 was not the optimal difference between the exponents, but it was chosen, as it defines a new model class introduced in this paper.The Discussion will provide a biological interpretation for the difference d = ba of the exponents, based on an alternative explanation of Equation (1).

The Parks-1/3 Model Class
The traditional explanation of differential Equation (1) proposed that the rate of growth would be proportional to the difference between anabolism and catabolism, both of which would be proportional to a power of mass.Specific values of the exponents were then derived from biophysical arguments; e.g.: b = 1, as catabolism would be proportional to mass (number of sustained cells) and a = 2/3, as anabolism would be proportional to the gills' surface (oxygen consumption) and therefore to the 2/3 th power of mass [1].
The present alternative explanation of Equation ( 1) hypothesizes that body mass would be a function of the total food intake F(t) since birth (t = 0), whereby the (individual) asymptotic mass m max would be approached at a rate dependent on instantaneous food intake: ( ) In the formula, k 1 > 0 a constant of proportionality and d is a constant that explains the speed of growth: With the same total food intake F, with a larger d the asymptotic mass m max is approached faster.This model, using d = 1, was proposed by Parks ([35] at p. 26), who supported it by evidence from farmed animals.The food consumption, which cannot be observed for wild-caught fish, can be eliminated by the additional hypothesis that the instantaneous food intake dF/dt would be proportional to the energy needs E (i.e.dF/dt = k 2 •E), which in turn would be proportional to a power of body mass (i.e.E = k 3 •m a ).Substituting this into Equation ( 7), then after a multiplication and renaming of constants (p = k 1 •k 2 •k 3 •(m max ) d , q = k 1 •k 2 •k 3 ) this simplifies to Equation (8), which is just another parametrization of Equation ( 1 Figure 1 suggests that the Parks-1/3 model class may come close to providing the best-fitting description of the mass-growth of several species of fish. Figure 7 illustrates the good fit compared to the best-fitting and the von Bertalanffy models for the six considered data.The optimization used the custom-made approach, starting with a, m 0 , p, q of the optimal model (setting b = a + 1/3).Table

Figure 1 .
Figure 1.Exponent-pairs of named models and optimal fits.Red squares indicate named models, grey lines indicate more general model classes, and blue dots indicate best fitting models to the indicated data.The dashed grey line indicates a new model class of this paper.

Figure 4 .
Figure 4. Probability plot.The figure assesses the fit of a normal distribution (location parameter m = 0, shape parameter s = 0.241) with the differences of the logarithm of mass and the respective age-class averages of the logarithm of mass for the data of Figure 3.The dotted line is the diagonal and the solid line is the P-P-plot, which compares the normal distribution (x-axis) with the observed cumulative distribution function (whereby a good fit is indicated by closeness to the diagonal).

Figure 5 .
Figure 5. Catch curve for data #11 of Figure 3 (male Walleye from Lake Erie, USA) based on (a) all data and (b) data without duplicates (lower bars).The figure counts, how many fish were observed for each age-class.

Figure 1
plots the location of the optimal exponent-pairs in relation to the traditional named models with assumed exponent pairs.Apparently, the optimal exponent-pairs were unrelated to any model or model class (except for one Walleye close to the West model).All exponent-pairs were remote from the Gompertz-class (diagonal a = b), from the non-sigmoidal class (a = 0 and b > 0) and (except for Smallmouth Bass) from Richards' model (a = 1 and b > 1) to consider model classes that are defined by assuming a value of d.As the original motivation comes from Parks, this paper calls this the Parks-d model class.For instance, Parks' assumption would define the Parks-1 model class with bounded exponential growth (a = 0: constant energy needs) and logistic growth (a = 1: energy needs proportional to mass) as spe-Open Journal of Modelling and Simulation cial cases.As the von Bertalanffy exponent-pair is a special case of the Parks-1/3 model class, the present paper investigates this class.

Table 1 .
Type of the original sources and processing of 39 data-sets.

Table 2 .
Characteristics of the data.
a Locations from the USA; b F female, M male, U unsexed; c Size at age in years; d Conversion into gram.

Table 3 .
Comparison of three models to test the sigmoidal shape of the data.
a Y yes, N no.

Table 4 .
Comparison of three models for optimality.