Computer program of nonlinear, curved regression for ‘probacent’-probability equation in biomedicine

On the basis of experimental observations on animals, applications to clinical data on patients and theoretical statistical reasoning, the author developed a computer-assisted general mathematical model of the “probacent”-probability equation, Eq.1 and death rate (mortality probability) equation, Eq.2 derivable from Eq.1 that may be applicable as a general approximation method to make useful predictions of probable outcomes in a variety of biomedical phenomena [1-4]. Eqs.1 and 2 contain a constant, γ and c, respectively. In the previous studies, the author used the least maximum-difference principle to determine these constants that were expected to best fit reported data, minimizing the deviation. In this study, the author uses the method of computer-assisted least sum of squares to determine the constants, γ and c in constructing the “probacent”-related formulas best fitting the NCHS-reported data on survival probabilities and death rates in the US total adult population for 2001. The results of this study reveal that the method of computer-assisted mathematical analysis with the least sum of squares seems to be simple, more accurate, convenient and preferable than the previously used least maximum-difference principle, and better fitting the NCHS-reported data on survival probabilities and death rates in the US total adult population. The computer program of curved regression for the “probacent”-probability and death rate equations. may be helpful in research in biomedicine.


INTRODUCTION
On the basis of experimental observations on animals, clinical applications on patients and theoretical statistical reasoning, the author developed a general mathematical model of "probacent"-probability equation that may be applicable as a general approximation method to make useful predictions of probable outcomes in a variety of biomedical phenomena [1][2][3][4].
The model of the "probacent"-probability equation was constructed from experimental studies on animals to express survival probability in mice exposed to g-force in terms of magnitude of acceleration and exposure time [1,5]; and to express a relationship among intensity of stimulus or environmental agent (such as drug [1,2,6], heat [7], pH [8], electroshock [7,9] and radiation [4,10]), duration of exposure and biological response in animals.
The model has been applied to data in the literature to express carboxyhemoglobin levels of blood as a function of carbon monoxide concentration in air and duration of exposure [11,12]; to express a relationship among plasma acetaminophen concentration, time after ingestion and occurrence of hepatotoxicity in man [13,14]; to predict survival probability in patients with malignant melanoma [15][16][17]; to express survival probability in patients with heart transplantation [18,19]; to express a relationship among age, height and weight, and percentile in Saudi and US children of 6 -16 years of age [20][21][22]; to predict the percentile of heart weight by body weight from birth to 19 years of age [23,24]; and to predict the percentile of serum cholesterol levels by age in adults [25][26][27].
The model was applied to the United States life tables, 1992 and 2001 reported by the National Center for Health Statistics (NCHS) to construct formulas expressing age-specific survival probability, death rate and life expectancy in US adults, men and women [3,[28][29][30][31].
The formula of survival probability is expressed by the following "probacent"-probability Eq.1:   where T = time after biomedical insult, diagnosis of cancer or age; P = "probacent" (abbreviation of probability percentage) = relative biological amount of 'reserve' for survival; "probacent" (P) of 0, 50 and 100 corresponds to -5 SD, mean and mean +5SD, respectively; the unit of "probacent" is 0.1 SD.In addition, 0, 50 and 100 "probacents" seem to correspond to 0, 50 and 100 percent probability in mathematical prediction problems in terms of percentage.Therefore, it seems to the author that survival probabilities can be used to predict probabilities in general biomedical phenomena."probacent" (P) values are obtainable from a list of conversion of percent probability into "probacent" that was published by the author (Table 6 of Ref. [1] and If the value of γ becomes equal to one, Eq.1 represents a log-normal distribution.Eq.1 is considered to be fundamentally based on the Gaussian normal distribution.
where D represents death rate in percentage (mortality probability); T is time or age; c, a and b are constants; c represents a curvature (a shape of curve) like γ in Eq.1a; a is an intercept and b a slope.
If the value of constant c becomes equal to one, Eq.2 is essentially similar to the Weibull distribution [32].
Eq.2 was applied to express death rates in US adults [3,30,31].It was found to better express death rates in US total elderly population than the Gompertz, the exponential and the Weibull distributions [3].
Eq.2 has been successfully applied to predict mortality probability in total body irradiation without medical support in humans as a function of dose rate of radiation and duration of exposure [4], and to express mean survival time as a function of daily dose rate of total body irradiation in mice [33].
Mehta and Joshi [34] successfully applied the "probacent"-probability equation, Eqs.1 and 2 to use modelderived data as an input for radiation risk evaluation of Indian adult population.The author used a principle of least maximum-differrence, I(E-O)I in determining the best-fitting γ and c values to the observed data curve.Here E and O in the parenthesis stand for formula-derived and NCHS-reported age-specific survival probability or death rate, respectively.In analysis of the least maximum-difference, random different values of integer and/or fractional number are substituted as γ and c values in Eq.1 or 2 to calculate survival probabilities, (S) or death rates, (D).The above described method of the least maximum-difference principle was used in the author's previous publications to minimize the deviation.The least sum of squares of well-known linear regression in statistics [32,35,36] is not employed in the previous author's studies.However, to my knowledge, there seem to be no computer-pro-gram-assisted, nonlinear, curved regression models of the least sum of squares in the literature that determine the best-fitting constant, γ or c value in the "probacent"probability or death rate equation, Eq.1 or 2, minimizing the sum of deviation [37][38][39][40][41][42].

The Constants, γ in
The purpose of this study is to design a computer program of nonlinear, curved regression of the least sum of squares for construction of best-fitting equations. of "probacent"-probability and death rate developed by the author to the NCHS-reported data [29].

MATERIALS AND METHODS
The National Center for Health Statistics reported the United States life tables, 2001 for US total, male and female populations on the basis of 2001 mortality statistics, the 2000 decennial census and the data from the Medicare program (E.Arias, United States life tables, 2001, Natl.Vital Stat.Rep. 52 (2004) 1-40 [29]).
The author published computer-assisted predictive formulas expressing the NCHS-reported survival probabilities, death rates (mortality probabilities) and life expectancies in US adults, men and women, 2001, employing a model of the "probacent"-probability and deathrate equations previously published by the author in the study [3].The survival probability is percent probability of surviving to the beginning of age T from birth.The death rate is percent probability of dying between age T to T + 1.
The data are plotted on a log-log graph paper as illustrated in Figures 1 and 2.
In this study, the data on survival probabilities and death rates shown in the NCHS' report [29] and [3] as well as Figures 1 and 2 are used to design computer programs of nonlinear, curved regression of the least sum of squares for the "probacent"-probability and death rate equations to minimize the sum of deviations, and to find the best-fitting constant values, γ and c.

Use of the Least-Maximum-Difference Principle in Analysis
In the author's previous studies, the least maximumdifference principle, least I(E-O)I (the absolute value of the difference) is used to minimize the deviation.

Formulas of Survival Probabilities (S)
A mathematical method to determine constants, γ, A and B in Eq.1 is described in Appendix of Ref. [3].Two sets of data on age (T) and survival probability (S) are used in each age group, 20 -60, 60 -85 or 85 -100 years to determine constants A and B as seen in Eqs.3a, 3b and 3c, respectively.The most appropriate and bestfitting γ values of Eq.1 for the age groups of 20 -60, 60 -85, and 85 -100 years are determined, using the least maximum-difference principle and comparing maximum differences I (E-O) I calculated by substituting a various semi-random and semi-selective values as the γ value in Eqs.3a, 3b and 3c.

 
The following Eqs.4, 5 and 6 are thus constructed to express survival probabilities of the three age groups: The age group of 20 -60 years: Eqs.4a and 4b.
  The age group of 85 -100 years:    

Use of the Least Sum of Squares in Analysis
In this study, the least sum of squares is used.

Formulas of Survival Probabilities (S)
The method of least sum of squares, least ∑ (E-O)² is used to determine the best-fitting γ and c values of the "probacent"-probability equation to minimize the sum of deviations.Abridged five-year intervals are used for analysis to simplify computer programs.
A close look at the data points in Figure 1 in graphic inspection suggests that the line connecting data points at each age group of 20 -60, 60 -85 and 85 -100 years bulges upward, revealing an upward convexity and so that the γ value is >1.If the line shows a straight line, it indicates γ = 1.If the line reveals a downward like the line connecting the data points on death rates of the age group of 60 -85 years in Figure 2, it would indicate 0 < γ < 1.
A three-step approach in analyzing data with help of the computer program is taken to find the best-fitting constant values, γ and c in Eqs. 1 and 2.
The first step of computer-assisted mathematical analysis: Enter an integer N, starting from 1 and increasing the integer, 2, 3, up to N as the γ value in Eq.3a for the age group of 20 -60 years in US adults.Sums of squares, Σ (E-O)² are calculated with the computer program shown in Figure 3.The computer-derived line representing Eq.3 with a specific γ value of 1 to N first approaches toward the NCHS-reported-data line from the starting straight line; the sum of squares would be gradually decreasing.When the computer-generated line touches the NCHS-reported-data line, the sum of squares becomes minimum, the least sum, ideally zero.After passing the NCHS-data line, the sum of squares with increasing γ values would suddenly begin to increase and continues to increase further more.These processes are shown in Table 1.
The second step of computer-assisted mathematical analysis: If the sum of squares suddenly starts increasing after preceding gradual decrease at integer N + 1 of γ value, then enter N -0.1 and N + 0.1 as γ value in Eq.3a.Calculate the sums of squares.Compare the sums at (N -0.1) and (N + 0.1) with the sum at N.
The third step of computer-assisted mathematical analysis: If the sum at (N -0.1) is smaller than the sum at N, then enter (N -1) + 0.1, (N -1) + 0.2,  (N -1) + 0.9 as γ value in Eq.3a.Compare the sums of squares and choose the number with the least sum of squares that is determined to be the best-fitting γ value for Eq.3a.A very close and best agreement is found between the computer-derived and NCHS-reported survival probabilities with the γ value of 12.8.Eqs.9a and 9b, are finally derived to best represent a relationship between  age and survival probability in US adults of 20 -60 years of age.
If the sum at (N -0.1) is larger than the sum at N and the sum at (N + 0.1) is smaller than the sum at N, then enter (N + 0.2), (N + 0.3)  as γ value in Eq.3a.
Compare the sums of squares and choose the number with the least sum of squares that is the γ value best fitting to the data.
The equations of survival probabilities, Eqs.10 and 11 for the age groups of 60 -85 and 85 -100 years are likewise derived as shown in Both methods of mathematical analysis, the leastmaximum-difference and the least sum of squares give different γ values, 1.7 and 1.8 for the age groups of 20 -60 years.However, both methods give same γ values, 4.8 and 4.8 for the age group of 60 -85 years, and 2.3 and 2.3 for the Age Group of 85 -100 Years.

Formulas of Death Rates (D)
The constants c, a and b are likewise derived as explained above and as seen in Table 2. Fractional numbers are used to determine these constants.Two following formulas expressing death rates for the age groups of 60 -85 and 85 -100 years for the US total elderly population: The age group of 60-85 years: Eq.12.
     # Compare the sum with the sum at the last number (N) just before its sum starts increasing (see text).## Compare the sum with the sum at the last number (N) just before its sum starts increasing (see text).
Both methods of mathematical analysis, the least maximum-difference and the least sum of squares give different c values, 0.82 and 0.79 for the age group of 60 -85 years, and 1.7 and 1.8 for the age group of 85 -100 years, respectively.

Description of the Computer Program
The programs were written in UBASIC for IBM PC microcomputer and compatibles for Eqs.3-13.The computer program uses a formula of approximation instead of the integral of Eq.1b and Eqs.4b, 5b, 6b, 9b, 10b, 11b) because the computer cannot perform integral [2,[43][44][45].Mathematical transformation of integral, Eq.1b to the formula of approximation is described in detail in the author's book [45].A representative computer program is illustrated in Figure 3 to calculate the sum of squares, Σ (E-O)² with the γ value of 12.8 in Eq.9a.

Statistical Analysis
A χ² goodness-of-fit test (logrank test) [35] is used to test the fit of mathematical models to the NCHS-reported data [29].The differences are considered statistically significant when p < 0.05.

RESULTS
Tables 3 and 4 show comparison of least maximumdifferences, I(E-O)I, least sum of squares, ∑ (E-O)² and χ²-test p value in the two analytical methods of the least maximum-difference and least sum of squares, in agespecific survival probabilities and death rates for US total adult population, calculated by computer programs as shown in a representative program, Figure 3.
The γ values in the survival probability equation in both methods are different, 12.7 and 12.8 in Eqs.4a and 9a for the age group of 20 -60 years but same 4.8 and 4.8 in Eqs.5a and 10a for the age group of 60 -85 years, 2.3 and 2.3 in Eqs.6a and 11a for the age group of 85-100 years.The c values in the death rate equation in both methods are all different, 0.82 and 0.79 in Eqs.7 and 12, 1.7 and 1.8 in Eqs.8 and 13 for the age groups of 60 -85 and 85 -100 years, respectively.
The least maximum-difference and the least sum of squares reveal slightly smaller values in those in the least sum of squares than in the least maximum-difference but same values in Eqs.5 and 10, and Eqs.6 and 11 for the age groups of 60 -85 and 85 -100 years.The above results suggest that regression curves of the least sum of squares are closer to the NCHS-data-connecting line than those of the least maximum-difference.
The χ ² -test p values are all >0.995, suggesting a very close agreement between both values of computer-derived and NCHS-reported survival probabilities and death rates.
The above described results seem to indicate that the analytical method of the least sum of squares is simpler, convenient and preferable, and give more accurate in  determining values of γ and c constants in the "probacent"-probability and death rate equations.

DISCUSSION
Comparison of data shown in Tables 3 and 4 suggests a very close agreement between formula-derived and NCHS-reported data on survival probabilities and death rates in US total adult population because χ ² -test p values are >0.995 for each equation expressing them.However, The method of the least sum of squares, least ∑ (E-O)² gives more accurate and best fitting values of constants, γ and c in these equations that fit better the NCHS-reported data, closer to the data-points connecting line.The computer program of curved regression of the least sum of squares for the "probacent"-probability and death rate seems preferable to the method of the least maximum-difference, least I(E-O)I to minimize the deviation.
The author feels that in a variety of biological phenomena, γ and c values are, if applicable, generally greater than one or less than one but not one, indicating a curved line when plotted on a X-Y graph paper as seen in Fig- ures 1 and 2. The γ and c values are relatively rarely one, indicating a straight line on a graph or otherwise approximately appearing straight.This phenomena seems to be possibly analogous in physics to that light path is actually curved when passing through a gravitational field of space but appears straight [46,47].
If the γ value becomes equal to one, Eq.1 represents a log-normal distribution.If the c value is one, Eq.2 that is derivable from Eq.1 [30] becomes essentially similar to the Weibull distribution [32].Weibull distribution is a generalized exponential distribution [32].If the base of a logarithm is one, the lognormal distribution would become a normal distribution (log 1 1 n = n) [45,48].If the logarithm of one as its base is taken for X axis of time, the Gompertz distribution might be similar to the Weibull distribution.Therefore, it seems to the author that the Gompertz distribution might be a specific form of the "probacent"-probability equation.A normal distribution is likewise a specific form of the "probacent"-probability equation.
"probacent" can be a dependent variable versus an independent variable such as time or age as seen in survival probability, death rate and life expectancy in US total adult population (NCHS) [3,29]."probacent" can be a dependant variable versus two independent variables such as intensity of stimulus or harmful agent and duration of exposure like dose rate of radiation and duration of exposure in total body irradiation [4], and like dose of drug and time after administration [2,14].In cases of two independent variables, Eq.1 can make a prediction of probability of occurrence of a response in subjects in various biomedical phenomena.The original and ultimate purpose of the author's studies has been to find a general mathematical model, possibly a mathematical law hidden in nature that might calculate the probability of safe survival in humans and other living organisms exposed to any harmful or adverse circumstances, overcoming the risk [1,45].
The "probacent"-probability does not predict a single definite result or response for an individual observation in biodynamic biological phenomena.Instead, if the same observations are made on a large number of similar population, each of who had the same condition at the start, the model would predict the possible outcomes, the approximate biomedical events in quantities under observations, but it could not predict the occurrence of the specific event in an individual.Thus, the "probacent"probability would introduce an unpredictability in biomedicine like an uncertainty principle of Werner Heisenberg in quantum mechanics [46,47] The computer program represented by Figure 3 can easily calculate survival probabilities that are required to determine the least sum of squares, by using an approximation instead of integral in Eqs.4b, 5b, 6b, 9b, 10b, 11b.This enables users of the "probacent" model in mathematical analysis, to eliminate a need for consultation of table of normal frequency or percentile in books of statistics and mathematics.

CONCLUSIONS
In this study, a computer program of nonlinear, curved regression of the least sum of squares is designed to determine the constant values of γ in Eq.1 and c in Eq.2 that seems better fitting and more accurate than those obtained by the least maximum-difference principle as suggested by the data shown in Tables 3 and 4. The regression curve obtained by this method of the least sum of squares is closer to the data-point-connecting line than that obtained by the least maximum-difference principle.The computer program of curved regression for the "probacent"-probability equation may be helpful in research in biomedicine.The computer program of curved regression of this study would need further improvement to enable users to readily find the best-fitting constant values in the equations of the "probacent"-probability and death rate.
Eq.1 of Survival Probability, and c in Eq.2 of Death Rate If the constants, γ in Eqs.1 and c in Eq.2 are one, then both equations represent a straight line when data points are plotted against age on a graph paper as illustrated in Figures 1 and 2. If the γ and c values are >1, it indicates that the data-points-connecting curve would reveal an upward convexity by graphical inspection.If the γ and c values are <1, it indicates that the data curve would reveal a downward convexity on the graph.

Figure 1 .
Figure 1.Relationship between age and percent survival probability in the US total adult population of age 20 -100 years for 2001.The abscissa represents age in years (log scale) and the ordinate percent survival probability (S) (normal probability scale) on the right scale and "probacent" (P) on the left scale.Data points of open circles indicating survival probabilities at different ages appear to fall overall on a solid curved line.The solid line can be expressed by Eqs.4-6.

Figure 2 .
Figure 2. Relationship between age and death rate in the US total elderly population of 60 -100 years for 2001.The abscissa represents age in years and the ordinate death rate (D) in percentages (log scale).Data points of closed circles indicate US national life table death rates reported by the National Center for Health Statistics (NCHS) for 2001.The dashed straight line represents death rates predicted by the Gompertz mortality model expressed by equation, D = 10 (-2.2674 + 0.03779T) .The solid curved line represents death rates predicted by the "probacent"-probability model of death rate (D) expressed by Eqs.7 and 8. Data points of NCHS appear to fall overall on the solid death-rate line predicted by Eqs.7 and 8.The maximum predictive error of the "probacent" model is ±0.3% and that of the Gompertz model ±3.2%.Source: reference [3].

Figure 3 .
Figure 3.The computer program to calculate the sum of squares, Σ (E-O)² as a function of γ value and age (T) in the US total adult population.Results of execution of the program are shown in Tables 1 and 3.This program is for γ value of 12.8 in Eq.4a for the age group of 20 -60 years.

.2. Formulas of Death Rates (D) Constants
[33]age group of 85-100 years: Eqs.6a and 6b., c, a and b are determined likewise as above described (the author's note: see Appendix of Reference[33]if needed) and the following equations are constructed to express death rates of the two age groups:The age group of 60 -85 years:

Table 1 .
Sums of squares of differences, Σ (E-O)² in nonlinear, curved regression of the least sum of squares to determine a best-fitting γ value for the "probacent"-probability equation expressing age-specific survival probabilities (S) a in US total adult population.Sums of squares of differences are calculated by computer programs.A representative program is illustrated in Figure3.
a: : survival probability is percent probability of surviving to the beginning of age T from birth; *N represents a number, integer or fractional number; ** D indicates that sum, ∑ (E-O)² decreases below the preceding sum; *** I indicates that sum, ∑ (E-O)² increases above the preceding sum; # Compare the sum with the sum at the last number (N) just before its sum starts increasing (see text); ## Compare the sum with the sum at the last number (N) just before its sum starts increasing (see text).

Table 2 .
Sums of squares of differences, Σ (E-O)² in nonlinear, curved regression of the least sum of squares to determine a best-fitting c value for the death rate equation expressing age-specific death rates (D) a in US total elderly population.Sums of squares of differences are calculated by computer programs.A program essentially similar to Figure3program is employed.
a : death rate is percent probability of dying between age T to T +1.*N represents a number, integer or fractional number.**D indicates that sum, ∑ (E-O)² decreases below the preceding sum.*** I indicates that sum, ∑ (E-O)² increases above the preceding sum.

Table 3 .
[29]arison of the least maximum-difference, І(E-O)І, the least sum of squares, Σ (E-O)² and χ²-test p value in the two analytical methods of the least maximum-difference and the least sum of squares in age-specific survival probabilities for US total adult population.γvalue is obtained by the method of the least maximum-difference, I(E-O)I.. ** γ value is obtained by the method of the least sum of squares of curved regression, Σ (E-O)².'E'indicatescomputer-derived value of survival probability.'O'indicatesNCHS-reported value of survival probability[29](see text). *

Table 4 .
[29]arison of the least maximum-difference, І(E-O)І, the least sum of squares, Σ (E-O)² and χ²-test p value in the two analytical methods of the least maximum-difference and the least sum of squares in age-specific death rates for US total elderly population.γvalue is obtained by the method of the least maximum-difference principle, I(E-O)I.**γvalue is obtained by the method of the least sum of squares of curved regression, Σ (E-O)² .'E'indicatescomputer-derived value of survival probability.'O'indicatesNCHS-reported value of survival probability[29](see text). *