Bayesian Estimation of Population Size via Capture-Recapture Model with Time Variation and Behavioral Response

We consider the problem of population estimation using capture-recapture data, where capture probabilities can vary between sampling occasions and behavioural responses. The original model is not identifiable without further restrictions. The novelty of this article is to expand the current research practice by developing a hierarchical Bayesian approach with the assumption that the odds of recapture bears a constant relationship to the odds of initial capture. A real-data example of deer mice population is given to illustrate the proposed method. Three simulation studies are developed to inspect the performance of the proposed Bayesian estimates. Compared with the maximum likelihood estimates discussed in Chao et al. (2000), the hierarchical Bayesian estimate provides reasonably better population estimation with less mean square error; moreover, it is sturdy to underline relationship between the initial and re-capture probabilities. The sensitivity study shows that the proposed Bayesian approach is robust to the choice of hyper-parameters. The third simulation study reveals that both relative bias and relative RMSE approach zero as population size increases. A R-package is developed and used in both data example and simulation.


Introduction
The capture-recapture sampling methods were originally designed to estimate wildlife populations.They are now widely applied to numerous areas, such as biology and ecology for estimating animal and plant populations, sociology and demography for estimating the size of the population at risks, official statistics for census undercount procedures, economics and finance in credit scoring and fraud detection, and so on.Varieties of methods have been developed to analyse these models, such as maximum likelihood [1]- [6], moment methods based on sample coverage [7]- [9], martingales [10]- [12], and Bayesian methods [13]- [23], and so forth.With the rapid growth in both methodology and application, the literature on capture-recapture analysis is tremendously massive.For a comprehensive literature review on capture-recapture models, see [24]- [29].
In this paper, we focus on inferences for the closed population capture-recapture experiment with capture probabilities being assumed to vary with time (or sampling occasion) and behavioral response.This model was originally proposed by Pollock in his doctoral dissertation (1974), and was referred to as the model tb M in the literature, where the subscript t denotes the time (or sampling occasions) and b denotes the behavioral response; see Otis et al. [5].The behavioral response means that animals change their behavior after their initial captures.Animals might become "trap-happy" if being caught is rewarding (getting food from a baited trap, for example) or "trap-shy" if being caught is traumatic (having a plastic tag stapled through the ear, for example).The model tb M assumes that on a given sampling occasion all unmarked animals have one probability of capture and all marked animals have another probability of capture, and it assumes that these probabilities vary from one sampling occasion to another.
The model tb M is important in practice because animals often exhibit a behavioural response to capture.This model has been selected as the most likely model in some population-estimation studies obw78, pnbh90.The original tb M model is not identifiable without further restrictions on the parameters because the number of unknown parameters is more than that of the minimal sufficient statistics.In order to make the parameters estimable, one may assume that at least two of the capture probabilities are equal.However, there is no biological justification for making such an assumption [5].Tanaka [30] [31] suggested a regression procedure, which had serious problems of interpretation [32].Another assumption considered by Otis et al. [5] is that the ratio of the recapture probability to the initial capture probability is a constant.Chao et al. [2] derived the unconditional MLE (UMLE), the conditional MLE (CMLE) and the maximum quasi-likelihood (MQLE) estimators under this assumption without putting a minimal constraint on the constant ratio to ensure that neither initial nor recapture probabilities exceed 1.0.Under the same constant ratio assumption and with the minimal constraint of the constant ratio embedding in the MCMC, Lee and Chen [20] and Lee et al. [21] derived Bayes estimate by applying Beta prior to the initial capture probability and Uniform prior on the constant ratio.The results show that the performance of the estimate often depends on the choice of the hyper-parameters in the prior distributions.[19] proposed a Bayesian analysis of all eight closed population capture-recapture models within a logistic regression frame work.The method was illustrated through two real data samples, and the results are compared with those from MARK, which is a free software for capture-recapture data using classical MLE approach.Unfortunately, neither data set was determined to be suitable to the model tb M , and the merit of the method with the model tb M remains undiscovered.Also, there is no discussion about the performance of the Bayes estimates with dif- ferent possible of capture probabilities and/or different population sizes.Regarding the prior specification, [19] performs a limited prior sensitivity analysis in the second example and concludes that the different prior specification has relatively little influence on the final estimation.However, in the same example, it also points out that "the prior specified on the individual variance component appeared to be somewhat restrictive in this case, with values supported by the data having relatively small prior mass".Therefore, the choice of the hyper-parameter remains ambiguous.We are thus motivated to explore more reasonable restrictions to deal with the non-identification, and to find more robust population estimation of the model tb M in the Bayesian framework with more objective prior distribution, and the results are compared to the state-of-art MLE as proposed in [2].
R is a free software programming language and software environment for statistical computing and graphics.The R language is widely used among statisticians and data miners for developing statistical software and data analysis.In this paper, a R-package named sbmtb is created and used within the data analysis and simulation studies.
In this paper, we assume that the odds ratio of recapture probability and initial probability is a constant, as was initially proposed by Wang [22] in her doctoral dissertation; hierarchical prior distributions with vague information are assigned to the unknown parameters after logistic transformation; and Markov chain Monte Carlo (MCMC) methods are used to implement the Bayesian computation.The MCMC methods allow us to construct an ergodic Markov chain with stationary posterior distribution and, furthermore, to obtain the posterior quanti-ties of the parameters of interest.Full conditional distributions are used in the MCMC methods to obtain approximations of the posterior distributions and Bayes estimates of the posterior quantities of unknown parameters.
Otis et al. [5] provides an interesting data set on the deer mice study that was also used by Chao et al. [2] Altogether, there are 110 distinct mice caught out of 283 captures in 5 consecutive days.This same data set is used here to illustrate the proposed Bayesian approach of population estimation.
Simulation studies are conducted to confirm the findings in the example, and to examine the performance of the proposed Bayesian approach.Compared with the state-of-the-art maximum likelihood estimates derived by Chao et al. [2], the proposed Bayesian estimates provide a relative comparable population estimation with smaller standard error and mean square error, especially when the overall capture probability is low.In addition, the Bayesian estimate relies little on the underlying relationship between initial and recapture probabilities.An intensive prior sensitivity study shows that the proposed Bayesian inference remains robust with different values of prior distributions.The third simulation study shows some of the asymptotic properties of the proposed Bayes estimate such that both relative bias and mean square error fade away as the population size increases.
In Section 2.1, we briefly review the problem of non-identifiability and propose a novel restriction, in which the odds ratio of recapture probability and initial probability is a constant.In Section 2.2, we apply the hierarchical prior to the unknown parameters in the model, and put forward a objective approach to determine the hyper-parameters in the prior distributions.In Section 2.3, we derive full conditional posterior distributions of the unknown parameters for the MCMC algorithm.In Section 3, we use a real data example to illustrate the proposed Bayesian hierarchical estimation approach.In Section 4, we conduct simulation studies to examine the performance of the proposed Bayesian approaches.In Section 5, we conclude and give remarks.

Likelihood Function
We assume that the population size is N, and there are k sampling occasions.On the th j sampling occasion, let 0 j p denote the initial capture probability, or the probability of an unmarked animal being captured on the th j sampling occasion, and 1 j p be the recapture probability, or the probability of a marked animal being captured.The likelihood proposed by [5] for the model where ( ) u is the number of unmarked animals being captured on the th j sampling occasion, j m is the number of marked animals being recaptured on the th j sampling occasion, 1 1 = ∑ is the total number of marked animals in the population when the ( ) sample is taken, and 1 k M + is the number of distinct animals being captured during the experiment period.In this model, N, 0 p and 1 p are the 2k unknown parameters.The likelihood Function ( 1) is over-parameterized because there is a total of 2k unknown parameters, but the minimal sufficient statistic is ( ) , , , , , k − .Therefore, the model is non-identifiable with more parameters than data values unless some restrictions are made to the parameters.
We propose a restriction based on the odds of initial capture and recapture, as is initially discussed in [22] (unpublished doctoral dissertation).Odds (in favor of an event) is defined as the ratio of the probability that the event will happen to the probability that the event will not happen.It is an expression of relative probabilities, which other research studies have found to be more convenient to use than probabilities.We assume that the odds of recapture bears a constant relationship to the odds of initial capture, i.e., This assumption says that on a given sampling occasion, the odds of recapture is proportional to that of the initial capture.In a "trap happy" situation, the proportion r is greater than 1.0, and in a "trap shy" situation, r is less than 1.0.When there is no behavioural response, 1.0 r = .Comparing the proposed constraint with the constant ratio constraint, which presumes that 1 0 j j p p θ = , we found that the proposed constant odds ratio assumption is more convenient and practical because there is no need for another layer of constraint on the constant ratio to ensure that neither initial capture nor recapture probabilities exceed 1.0.The constant odds ratio r can be allowed to range from 0 to ∞ , naturally.Apply logistic transformations to (2), and let ( ) ( ) , c ∈ −∞ ∞ .In a "trap happy" situa- tion, 0 c > .On the contrary, in a "trap shy" situation, 0 c < .When there is no behavioral response, 0 c = .In addition, all parameters now range from −∞ to ∞ .It is worth mentioning that Equation ( 3) is similar to the model tb M in [19] but the parameters here bear more meaningful practical interpretation.Let 1 0 M = and 1 0 m = then the likelihood Function (1) can be written as ! 1 e e 1 e 1 e j j j j j j j j where j j j n m u = + is the number of subjects caught in the th j sampling occasion.Denote ( ) and (4) is the likelihood function of ( ) , ,  N c η .In this likelihood function, there are 2 k + unknown parameters, and the number of minimal sufficient statistics is 2 1 3 k ≥ , all unknown parameters are estimable by the capture history data.

Priors
We apply Bayesian approaches to the re-parametrized likelihood Function (4) to estimate the population size N with the hierarchical prior.For 0 j η , a Normal distribution prior is applied in the first stage, i.e., ( ) . . .
In the second stage, another Normal distribution prior is applied to 0 µ , and an Inverse Gamma distribution prior is used for 0 δ , i.e., ( ) and ( ) where 0 ν , 0 τ , and 0 a are predetermined hyper-parameters.The prior distribution of c is assumed to be normal, i.e., ( ) where 1 µ and 1 δ are further assumed to have the following distributions.
( ) and ( ) where 1 ν , 1 τ , and 1 a are predetermined hyper-parameters.The hyper-parameters are chosen using vague information as follows.If the initial capture probability is assumed to be between 10 l − and 1 10 l − − , with 1 l ≥ , then the range of s η′ will be roughly between l − and l .By applying the 3σ or empirical rule, the expected value of the standard deviation of s η′ is around 3 l , namely the expected value of 0 δ is about 2 9  l .Note that 0 δ is assumed to have a ( ) IG , a a with a mean of ( ) It is adequate in practice to presume that 0 j p ranging between 0.0001 and 0.9999, which results in 0 ' j s η varying from −4 to +4.Thus, it is sufficient to assume the mean 0 µ has a standard normal distribution with 0 0 ν = and 0 1 τ = , and the expected standard deviation is about 1.33, which indicates 0 a is about 2. With similar arguments, we set 1 0 ν = , 1 1 τ = , and 1 2 a = .
For the prior distribution of N, the Poisson-Gamma type of hierarchical prior is a popular choice in the literature [33]- [37].In the first stage, the unknown population size N is assumed to have a Poisson distribution with mean λ , i.e., ( ) In the second stage, λ is assumed to have a Gamma distribution with hyper-parameters 1 α and 1 β , i.e., ( ) Gamma , λ α β The marginal prior distribution of N , which is obtained by integrating over λ , is ( ) ) In the situation where little information about λ is available, a vague prior distribution is desired.When

Bayesian Computation
The population size N is the parameter of interest.Under the square error loss function, the posterior mean is the Bayesian estimator.However, the closed form expression of the posterior distribution of N is not easy to achieve.Therefore, we will use Markov chain Monte Carlo (MCMC) methods to obtain numerical solutions.In order to implement the MCMC procedure, it is necessary to have the full conditional posterior distributions, which are the conditional distributions of one parameter given all the other unknown parameters and data.
The full conditional posterior distributions of ( ) , , ,  N c µ δ λ η given capture history X are as follows: • ( ) , , , , , 1 e 1 e j j j j j j j The conditional distributions of c and η do not have common statistical distribution forms.Gilk [38] proposed an adaptive-reject method to sample from a log-concave density up to a normalizing constant.It can be shown that the conditional density functions are all log-concave, and the conditional density function 0 0 , , , , , is log-concave.

Real Data Examples
Otis et al. [5] provides an interesting data set of deer mice.The data were collected by S. Hoffman (pers.comm.) in mid-July 1974.Trapping was on 5 consecutive mornings, and 110 distinct mice were caught.It is assumed that there exists time and behavioural response of capture.The summary statistics are presented in Table 1.This data set is also used in Chao et al. [2], Lee and Chen [20], Lee et al. [5].We use a Gibbs sampler in the R-package to implement the Bayesian computation.The burn-in stage is 15,000 iterations.The samples from the next 10,000 iterations are taken to construct the approximate posterior distribution of N. The Bayesian estimate of the population size N is ˆ160 B N = , with a standard error of 34.20.The three types of maximum likelihood estimates in Chao et al. [2] [20] and Lee et al. [5] obtained similar results as Chao et al. [2] by adjusting the values of hyper-parameters with a trial-and-error method.
The Bayesian estimate is very similar to the UMLE with a smaller standard error.It is also smaller than the CMLE, with a smaller standard error and is greater than the QMLE with a slightly greater standard error.The Bayesian estimates of initial capture probabilities are also computed as  ˆ1.7917 c = .The initial capture probabilities are between 0.20 and 0.25, with overall catchability, which is the probability that a subject being captured at least once, around 0.73.As concluded by Chao et al. [2], when the overall catchability is high, the UMLE is preferable to the other two MLE estimates.Our Bayesian estimate of the deer mice population size is mostly close to the UMLE with a smaller standard error; hence, it is more desirable than the later one.
It is worth noting that the estimated behavioural effect ˆ1.7917 c = is greater than 0, which implies animals have a trap-happy response.This is consistent with the data since there are more recaptures than the initial captures on most sampling occasions.

Simulation Studies
Chao et al. [2] reports a simulation study with the true population size of 400 and 8 different initial capture probabilities, with the recapture probabilities being calculated as 1 0 1.5 . Here, we use the same true population size and the initial capture and recapture probabilities as in Chao et al. [2].Together with the overall catchability, the values of the initial capture probabilities and recapture probabilities are listed in Table 2.
Several frequentist quantities are considered, such as bias, relative bias (which is the ratio of bias to the true population size), root mean square error (RMSE), and relative RMSE (which is the ratio of RMSE to the true population size) of Bayesian estimates of the population size.Suppose ˆB N is a Bayesian estimate of the popu- lation size N .The frequentist bias is the expected deviation between the estimate and the true value, i.e., ( ) . Though unbiasness is an attractive property, it does not take into consideration the variation associated with the estimator.The MSE is the frequentist expected squared deviation between the true value and the estimate, i.e., ( ) . It can be theoretically decomposed into a sum of the squared bias and variance of the estimator.Ideally, we would like to have an estimator with small bias and variance.However, it is also common in practice to trade off some increase in bias for a larger decrease in the variance and vice-versa so that a relatively smaller MSE can be achieved.
In the simulation, we generate capture histories 1000 times under each set of true parameters.The results of bias, relative biases (in parenthesis), RMSE, and relative RMSE (in parenthesis) are listed in Table 3.When the mean catchability is relatively low (cases 1 -4), in terms of bias and relative bias, Bayesian estimates are very close to the two preferred MLE's (CMLE and MQLE) in Chao et al. [2]; while, Bayesian estimates consistently have smaller RMSE's than the three MLE's.In case 3, the bias of the Bayesian estimate is slightly greater than the MQLE with about 4% difference; however, there is about 10% of drop in RMSE when using the Bayesian estimate.As previously mentioned, when it comes down to practice, it is not uncommon to trade off some increase in bias for a larger decrease in the variance.Therefore, we would prefer the Bayesian estimate to the MLE's in these cases.When there is a sufficient number of captures (cases 5 -8), according to Chao et al. [2], the UMLE is preferable to the other two estimators.Compared with the UMLE in these cases, the Bayesian approach yields comparable estimates in cases 5 and 8 with less RMSE.In case 7, the Bayesian estimate has a slightly larger bias than the UMLE, with a difference of less than 3%, but the RMSE dropped more than 3%.In case 6, the Bayesian estimate has a slightly larger bias with similar RMSE than the UMLE.Given the fact that the Bayesian estimates always provide a smaller RMSE than the three MLE's with comparable or slightly larger bias, we would conclude that the proposed Bayesian approach provides reasonably better estimation than the three MLE's.
It is worth mentioning that the simulated data was based on the constant ratio of initial and re-capture probabilities as in Chao et al. [2] rather than the constant odds ratio as we proposed.The conclusion that Bayesian estimates still outperform the MLE's, to a certain extent, indicates that the former one is robust to the constraints between the initial and recapture probabilities.
An usual difficulty of the Bayesian analysis is the need to specify prior distributions when little information is available for guidance.Here, we will examine how the proposed Bayesian inference is influenced by the choice  , correspondingly and respectively.Resulting Bayesian estimates are listed in Table 4, which shows that the proposed Bayesian estimate is robust with the choices of the presumed range of the capture probability.When initial capture probability is low (case 1), the discrepancy of Bayesian population estimates with various hyper-parameters is less than 2.5%.The difference decreases as the initial capture probability rises.That means the impact of the prior declines as data information accumulates.When the initial capture probability is high (case 8), Bayesian inferences of population sizes under different hyper-parameters are almost identical to each other.
The third simulation study examines the performance of the Bayesian estimation with different population sizes from 100 to 1000.We use the initial capture probabilities in cases 1, 2, and 8 in Table 2, representing low, medium low, and high capture probabilities, respectively.The behavioural effect parameter c is set to be −1 for a trap shy situation, and 1 for a trap happy one.In each simulation case, we generate the capture histories 1000 times.The resulting relative biases and relative RMSE are shown in Figure 1.The plot indicates that relative biases of Bayesian population estimates declines as initial catchability and/or population size increases.When initial capture probabilities are low, the Bayesian approach produces population estimates with negative biases.When initial capture probabilities are medium low, the Bayesian approach underestimates the population, but the relative biases approach zero very quickly as the population size increases.When initial capture probabilities are high, the Bayesian estimate slightly overestimates the population size, and relative biases are very close to zero even with small population sizes.Also, the Bayesian approach produces smaller biases in trap happy situations than in trap shy ones.It is worth pointing out that, as listed in Table 2, the overall initial catchabilty in case 2 is just a little more than that of case 1.Therefore, we would say that, in most cases, the relative bias of the Bayesian estimate approaches zero as population size increases.Figure 1 also shows that relative RMSE drops off as the population size and/or catchability increases.

Conclusions and Remarks
In this paper, we propose a Bayesian hierarchical approach to estimate the population size of model tb M with a new relationship between recapture probability and initial capture probability base on odds (in favour of capture).The capture-recapture model tb M has wide applications in practical population estimation studies.However, the original model proposed by Otis et al. [5] is over parametrized and has an identifiability problem without further constraint on the recapture probability.One common assumption in literature is that the ratio of recapture probability and initial capture probability is a constant.Under this condition, Chao et al. [2] proposed three types of MLE's.The major problem with the MLEs is that no constraint is applied to the constant ratio to ensure that the resulting estimated initial capture and recapture probabilities remain between 0 and 1. Lee et al. [21] proposed Bayesian approach, which constrained the constant ratio in MCMC to ensure neither initial capture nor recapture probabilities exceed 1.0.However, no systematic method was discussed for the choice of prior distribution other than a trial-and-error method, and the estimate is sensitive to the value of the hyper-parameters.[19] proposed a Bayesian analysis of all eight closed population capture-recapture models within a logistic regression frame work.However, the usefulness of the method with the model tb M remains unknown.We were inspired to propose a new constraint so that the odds ratio of recapture probability and initial capture probability remains a constant and to analyse the model tb M using a hierarchical Bayesian methodology with vague prior information.The benefit of the constant odds ratio assumption is that it is handier and "true-to-life", because there is no need for an extra constraint on the constant ratio to ensure that both initial capture and recapture probabilities are larger than 1.0.The constant odds ratio can naturally vary from −∞ to ∞ .Moreover, the simulation results show that the proposed Bayesian method provides comparatively better population estimation than the MLE's, and it is robust to the constraints on the capture probabilities.The sensitivity study shows that the proposed Bayesian estimate is robust to the choice of the hyper-parameters of the prior distribution.Also, the relative bias and RMSE of the proposed Bayesian population estimate approaches zero as the true population size grows in the usual course of application.

1 0α → and 1 0
β → , the prior distribution of λ satisfies ( ) prior.The propriety of using this prior needs to be studied.In later discussion, we choose 1

Figure 1 .
Figure 1.Relative bias and RMSE of bayesian population estimations.Note that the findings here are mostly based on simulation results.It would be interesting to examine the theoretical bias and MSE of the Bayesian estimate under the hierarchical prior distribution studied in this paper.We'll continue studying more complex models such as bh M ,

Table 1 .
Summary data of deer mice example.

Table 2 .
Capture probabilities and behavioural effect parameter in simulation study I.

Table 3 .
Comparison of bayesian estimations with MLE's.

Table 4 .
Sensitivity of the bayes estimation to the specification of the prior probability distributions.