In this paper, survival data analysis is realized by applying Generalized Entropy Optimization Methods (GEOM). It is known that all statistical distributions can be obtained as distribution by choosing corresponding moment functions. However, Generalized Entropy Optimization Distributions (GEOD) in the form of distributions which are obtained on basis of Shannon measure and supplementary optimization with respect to characterizing moment functions, more exactly represent the given statistical data. For this reason, survival data analysis by GEOD acquires a new significance. In this research, the data of the life table for engine failure data (1980) is examined. The performances of GEOD are established by Chi-Square criteria, Root Mean Square Error (RMSE) criteria and Shannon entropy measure, Kullback-Leibler measure. Comparison of GEOD with each other in the different senses shows that along of these distributions (MinMaxEnt)4 is better in the senses of Shannon measure and of Kullback-Leibler measure. It is showed that, (MinMaxEnt)3 ( (MaxMaxEnt) 4) is more suitable for statistical data among (MinMaxEnt)m,m=1,2,3,4(MaxMaxEnt)m, m=1,2,3,4. Moreover, (MinMaxEnt) 3 is better for statistical data than (MaxMaxEnt)4 in the sense of RMSE criteria. According to obtained distribution (MinMaxEnt) 3 (MaxMaxEnt) 4 estimator of Probability Density Function f<sup style="margin-left:-8px;">^</sup> (t), Cumulative Distribution Functio F<sup style="margin-left:-8px;">^</sup> (t) , Survival Function &Scirc;( t) and Hazard Rate &hcirc;( t) are evaluated and graphically illustrated. The results are acquired by using statistical software MATLAB.

Survival Function Censored Observation Generalized Entropy Optimization Methods Distributions
1. Introduction

Entropy Optimization Methods (EOM) have important applications, especially in statistics, economy, engineering and so on. There are several examples in the literature that known statistical distributions do not conform to statistical data; however, the entropy optimization distributions conform well. Generalized Entropy Optimization Methods (GEOM) have suggested distributions in the form of MinMaxEnt which is the closest to statistical data, and MaxMaxEnt which is the furthest from mentioned data in the sense of information theory   , respectively. For this reason, GEOM can be more successfully applied in Survival Data Analysis.

Different aspects and methods of investigations of survival data analysis are considered in  -  .

In particular in the paper  , it is investigated several problems of hazard rate function estimation based on the maximum entropy principle. The potential applications include developing several classes of the maximum entropy distributions which can be used to model different data-generating distributions that satisfy certain information constraints on the hazard rate function.

In order to represent the results of our investigations, we give some auxiliary concepts and facts first.

2. Survival Analysis

Survival time can be defined broadly as the time to the occurrence of a given event. This event can be the development of a disease, response to a treatment, relapse or death  .

Censoring: The techniques for reducing experimental time are known as censoring. In survival analysis, the observations are lifetimes, which can be indefinitely long. So quite often the experiment is so designed that the time required for collecting the data is reduced to manageable levels.

Let be a continuous, non-negative valued random variable representing the lifetime of a unit. This is the time for which an individual (or unit) carries out its appointed task satisfactorily and then passes into “failed’’ or “dead’’ state thereafter  .

The probabilistic properties of the random variable are studied through its cumulative distribution function or other equivalent functions defined below  :

Cumulative Distribution Function:

Survival Function: This function is denoted by, is defined as the probability that an individual survives longer than:

Probability Density Function: Like any other continuous random variable, the survival time has a probability density function defined as the limit of the probability that an individual fails in the short interval per unit width, or simply the probability of failure in a small interval per unit time. It can be expressed as

Hazard Rate: This function is defined as the probability of failure during a very small time interval, assuming that the individual has survived to the beginning of the interval, or as the limit of the probability that an individual fails in a very short interval, , given that the individual has survived to time:

3. Generalized Entropy Optimization Methods (GEOM)

Entropy Optimization Problem (EOP)  and Generalized Entropy Optimization problem (GEOP)  can be formulated in the following form.

EOP: Let be given probability density function (p.d.f.) of random variable, be an entropy optimization measure and be a given moment vector function generating moment constraints. It is required to obtain the distribution corresponding to, which gives extreme value to.

GEOP: Let be given probability density function of random variable, be an entropy optimization measure and be a set of given moment vector functions. It is required to choose moment vector functions, such that defines entropy optimization distribution closest to, defines entropy optimization distribution furthest from with respect to entropy optimization measure. If is taken as Shannon entropy measure, then is called the distri- bution, and is called the distribution      .

The method of solving GEOP is called as GEOM.

The problem of maximizing entropy function

subject to constraints

where

has solution

where are Lagrange multipliers. Finding the distribution which maximizes function (1) subject to constraints generated by equations in (2) is an optimization problem. In the literature, there have been numerous studies that have calculated these multipliers  . In this study, we use the MATLAB program to calculate Lagrange multipliers.

If (3) is substituted into (1), the maximum entropy value is obtained:

If distribution is calculated from the data, the moment vector value can be obtained for each moment vector function. Thus, is considered as a functional of and called the functional. Therefore, we use the notation to denote the maximum value of corresponding to

.

Let be the compact set of moment vector functions reaches its least and greatest values in this compact set, because of its continuity property. For this reason,

Consequently,

Distributions and corresponding to the and, respectively, are called and

distributions  .

method for a finite set of characterizing moment functions can be defined in following form.

Let be the set of characterizing moment vector functions and all combinations of elements of taken elements at a time be. We note that, each element of is vector with components.

Solving the and the problems require to find vector functions, , where

minimizing and maximizing accordingly with respect to Shannon entropy measure. It should be noted that reaches its minimum value subject to constraints generated by function and all -dimensional vector functions. In other words, minimum value of is least value of values corresponding to

. If gives the minimum value to then distribution corresponding to is called the

distribution. method represents probability distribution in the form of distribution. In a similar way, reaches its maximum value subject to constraints generated by function and all -dimensional vector functions. In other words, maximum value of is greatest value of values corresponding to

. If gives the maximum value to then distribution corresponding to is called the

distribution. method represents probability distribution in the form of distribution. It should be noted that both distributions can be applied in solving proper problems in survival data analysis.

In the present research, the data of the life table for engine failure data (1980) given in Table 1 is considered  .

In our investigation, the experiment is planned for 200 numbers of patients surviving at beginning of interval but the presence of censoring from the planning patients 97 individuals stay out the experiment. This situation is taken into account in Table 2.

It should be noted that, the presence of censoring in the survival times leads to a situation where the sum of observation probabilities stands less than 1 for the

The data of the life table for engine failure data (1980)
Survival Time (year) Working at the beginning of interval Failed during the interval Censored during the interval
0 - 120050
1 - 2195101
2 - 3184125
3 - 416782
4 - 5157100
5 - 6147156
6 - 712693
7 - 811481
8 - 910540
9 - 1010131
Observed and corrected probabilities
Observed probabilities Corrected probabilities
0 - 1200500.04850.0485
1 - 21951010.09710.1068
2 - 31841250.11650.1650
3 - 4167820.07770.0971
4 - 51571000.09710.0971
5 - 61471560.14560.2039
6 - 7126930.08740.1165
7 - 8114810.07770.0874
8 - 9105400.03880.0388
9 - 10101310.02910.0388

survival data. For this reason, in solving many problems, it is required to supplement the sum of observation probabilities up to 1. Since the sum of observed probabilities in Table 2 is 0.8155, according to the number of censoring, supplementary probability is uniformly distributed to each censoring data and corrected probabilities are obtained.

As we noted that above, and distributions can be applied in solving proper problems in survival data analysis. In our investigation as components of characterizing moment vector function

, are chosen. The set of moment functions is chosen from the characteristic moments which are mostly used in Statistics.

Consequently,. For example, if then

gives the least value to and

gives the greatest value to.

The distributions corresponding to

and values are shown in Tables 3-6. In these tables, and distributions corresponding to and are represented with bold font. By virtue of these tables are also obtained, ,

distributions which are shown in Table 7 and Table 8.

In order to obtain the performance of the mentioned distributions, we use various criteria as Root Mean Square Error (RMSE), Chi-Square, entropy values of distributions. The acquired results are demonstrated in Table 9 and Table 10.

All distributions are acceptable to survival data in the sense of Chi ? Square criteria.

In the sense of RMSE criteria each distribution is better than corresponding distribution. Moreover, is nearer to statistical data than and

3.30843.28543.32193.30403.3204
for Distribution0.1229 0.1171 0.1116 0.1064 0.1015 0.0967 0.0922 0.0879 0.0838 0.07990.1269 0.1249 0.1210 0.1153 0.1081 0.0997 0.0905 0.0809 0.0711 0.06150.0989 0.0995 0.0998 0.1000 0.1001 0.1002 0.1003 0.1004 0.1004 0.10050.1202 0.1238 0.1161 0.1083 0.1014 0.0954 0.0901 0.0855 0.0814 0.07770.1093 0.1058 0.1030 0.1010 0.0994 0.0981 0.0971 0.0962 0.0954 0.0947
3.20423.21403.29213.21063.2041
for Distribution0.0619 0.0921 0.1218 0.1434 0.1502 0.1400 0.1160 0.0856 0.0562 0.03280.0429 0.1137 0.1451 0.1490 0.1377 0.1194 0.0994 0.0803 0.0634 0.04920.0863 0.1449 0.1299 0.1126 0.0999 0.0914 0.0861 0.0833 0.0824 0.08320.0543 0.0939 0.1351 0.1529 0.1480 0.1290 0.1046 0.0804 0.0593 0.04240.0483 0.1036 0.1363 0.1493 0.1456 0.1296 0.1068 0.0820 0.0589 0.0397
3.24083.20573.23053.25003.2237
for Distribution0.1101 0.0825 0.1058 0.1285 0.1396 0.1344 0.1150 0.0877 0.0598 0.03660.0579 0.0928 0.1281 0.1484 0.1499 0.1353 0.1108 0.0830 0.0573 0.03650.0400 0.1294 0.1481 0.1403 0.1251 0.1091 0.0944 0.0816 0.0706 0.06130.0423 0.1425 0.1432 0.1278 0.1134 0.1018 0.0924 0.0848 0.0785 0.07330.0403 0.1205 0.1478 0.1454 0.1310 0.1133 0.0961 0.0809 0.0679 0.0570
3.20243.20003.20423.20833.2100
for Distribution0.0537 0.0972 0.1291 0.1471 0.1488 0.1355 0.1118 0.0838 0.0573 0.03570.0489 0.1034 0.1314 0.1458 0.1466 0.1342 0.1117 0.0844 0.0578 0.03580.0625 0.0920 0.1211 0.1427 0.1501 0.1405 0.1167 0.0859 0.0561 0.03240.0515 0.0974 0.1359 0.1519 0.1472 0.1290 0.1051 0.0808 0.0593 0.04200.0503 0.0988 0.1382 0.1526 0.1458 0.1268 0.1033 0.0802 0.0601 0.0438
3.21043.20353.20393.20503.2155
for Distribution0.0522 0.0964 0.1369 0.1530 0.1468 0.1277 0.1038 0.0802 0.0598 0.04330.0518 0.0987 0.1322 0.1487 0.1480 0.1330 0.1093 0.0827 0.0579 0.03760.0502 0.1007 0.1343 0.1493 0.1469 0.1313 0.1079 0.0822 0.0584 0.03880.0533 0.0968 0.1324 0.1497 0.1483 0.1325 0.1085 0.0822 0.0580 0.03830.0501 0.0986 0.1416 0.1543 0.1436 0.1227 0.0999 0.0792 0.0619 0.0480
3.19373.19353.19363.19323.1934
for Distribution0.0477 0.1198 0.1221 0.1306 0.1397 0.1395 0.1230 0.0921 0.0570 0.02840.0476 0.1203 0.1216 0.1302 0.1402 0.1400 0.1230 0.0917 0.0568 0.02870.0477 0.1201 0.1218 0.1303 0.1400 0.1398 0.1230 0.0919 0.0568 0.02860.0475 0.1217 0.1192 0.1292 0.1422 0.1421 0.1225 0.0898 0.0560 0.02980.0476 0.1209 0.1207 0.1298 0.1408 0.1407 0.1229 0.0911 0.0565 0.0290
1500.04850.12690.04830.04890.0475
21010.10680.12490.10360.10340.1217
31250.16500.12100.13630.13140.1192
4820.09710.11530.14930.14580.1292
51000.09710.10810.14560.14660.1422
61560.20390.09970.12960.13420.1421
7930.11650.09050.10680.11170.1225
8810.08740.08090.08200.08440.0898
9400.03880.07110.05890.05780.0560
10310.03880.06150.03970.03580.0298
1500.04850.09890.08630.05010.0477
21010.10680.09950.14490.09860.1198
31250.16500.09980.12990.14160.1221
4820.09710.10000.11260.15430.1306
51000.09710.10010.09990.14360.1397
61560.20390.10020.09140.12270.1395
7930.11650.10030.08610.09990.1230
8810.08740.10040.08330.07920.0921
9400.03880.10040.08240.06190.0570
10310.03880.10050.08320.04800.0284

distributions; each is better than all of distributions. From these results follows that among of distributions, the distribution is more suitable and among of distributions, the distribution is more convenient for statistical data. These results are also corroborated by graphical representation (see Figures 1-4). Consequently, we shall consider Probability Density Function, Cumulative Distribution Function, Survival Function and Hazard Rate for only and distributions.

Although the distribution with the largest number of moment functions tends to fit better, it should be noted that in some cases, the set of moment functions with fewer elements is more informative then a different set of moment functions with more number of elements.

Distribution of Calculated value of Chi-SquareProbability of Chi-Square valueTable value of Chi-SquareRMSE
3.28544.43100.81630.3158
3.20410.65120.99870.1873
3.20001.77870.93890.1799
3.19321.61610.89930.1830
Distribution of Calculated value of Chi-SquareProbability of Chi-Square valueTable value of Chi-SquareRMSE
3.32195.38200.71610.3492
3.29214.92330.66930.3492
3.21552.28040.89220.2104
3.19371.63830.89660.1888
4.2. Availability of GEOD to Survival Data in the Sense of Shannon Measure

In order to establish availability of GEOD to survival data in the sense of Shannon measure it is required to consider entropy values of GEOD.

From Table 3 it is seen that the (the) distribution is realized by vector function and

.

From Table 4 it is seen that the (the) distribution is realized by vector function

and

.

From Table 5 it is seen that the (the) distribution is realized by vector function

and

.

From Table 6 it is seen that the (the) distribution is realized by vector function

and.

Comparison of GEOD with each other in the sense of Shannon measure shows that along of these distributions is better.

The results of our investigation according to using known characterizing moment vector functions from are summarised in the form of following Corollary.

Corollary 1. If by denote the

(the) distribution corresponding to moment conditions generated by moment functions, then inequality

is fulfilled, when. In other words, entropy value of the (the) distribution depending on the number of moment conditions decreases.

Moreover for any the inequality

takes place.

4.3. Availability of GEOD to Survival Data in the Sense of Kullback-Leibler Measure

Now, we calculate the distance between observed distribution

given in Table 2 and distributions

given in Table 7 and Table 8 respectively.

It is known that the Kullback ? Leibler distance between distributions

and is obtained by formula

.

By starting these formula Kullback-Leibler measures for the distance between observed distribution and distributions

are given in Table 11 and Table 12 respectively.

From Table 11 and Table 12 follows that along of GEOD is better in the sense of Kullback-Leibler measure.

The results of our investigation according to using known characterizing moment vector functions from are summarised in the form of following Corollary.

Corollary 2. If observed distribution and

denote the (the) distribution corresponding to moment conditions generated by moment functions, then inequality

is fulfilled, when. In other words, Kullback-Leibler value of the (the) distribution depending on the number of moment conditions decreases.

Moreover for any the inequality

takes place.

distributions
0.3938
0.3348
0.3300
0.3193
distributions
0.4441
0.4009
0.3457
0.3198

In this section survival data analysis is conducted by

distribution since the above acquired investigations is more presentable for survival data among, distributions.

and estimations of Probability Density Function, Cumulative Distribution Function, Survival Function and Hazard Rate are given in Table 13 & Table 14, respectively.

On basis of the results given in Table 13 & Table 14, graphs of, and are demonstrated in Figures 5(a)-(c) & Figures 6(a)-(c).

1200500.04890.04890.95110.0514
21951010.10340.15230.84770.1220
31841250.13140.28370.71630.1834
4167820.14580.42950.57050.2556
51571000.14660.57610.42390.3458
61471560.13420.71030.28970.4632
7126930.11170.82200.17800.6275
8114810.08440.90640.09360.9017
9105400.05780.96420.03581.6145
10101310.03581.00000.0000--
1200500.04770.04770.95230.0501
21951010.11980.16750.83250.1439
31841250.12210.28960.71040.1719
4167820.13060.42020.57980.2253
51571000.13970.55990.44010.3174
61471560.13950.69940.30060.4641
7126930.12300.82240.17760.6926
8114810.09210.91450.08551.0772
9105400.05700.97150.02852.0000
10101310.02840.99990.0001--
5. Conclusion

In this study, it is established that survival data analysis is realized by applying Generalized Entropy Optimization Methods (GEOM). Generalized Entropy Optimization Distributions (GEOD) in the form of, distributions which are obtained on basis of Shannon measure and supplementary optimization with respect to characterizing moment functions, more exactly represent the given statistical data. For this reason, survival data analysis by GEOD acquires a new significance. The performances of GEOD are established by Chi-Square criteria, Root Mean Square Error (RMSE) criteria and Shannon entropy measure, Kullback-Leibler measure. Comparison of GEOD with each other in the different senses shows that along of these distributions is better in the senses of Shannon measure and of Kullback-Leibler measure. It