OJSOpen Journal of Statistics2161-718XScientific Research Publishing10.4236/ojs.2020.101005OJS-97987ArticlesPhysics&Mathematics Binary Logistic Models of Home Ownership in Wukari Nigeria JosephUchenna Okeke1*EvelynNkiruka Okeke2YeshakVictor Dakhin2Department of Mathematical Sciences, Taraba State University, Jalingo, NigeriaDepartment of Mathematics &amp; Statistics, Federal University, Wukari, Nigeria080120201001647323, October 201916, January 2020 19, January 2020© Copyright 2014 by authors and Scientific Research Publishing Inc. 2014This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/

The study is on the Binary logistic models of home ownership among civil servants in Wukari, Nigeria. The data used is of primary source using questionnaires. The multicollinear data , as well as the reduced data using the Principal component analysis and the stepwise regression methods to determine the factors that chiefly account for home ownership , were x - rayed . Four components were selected out of six namely grade level of respondent, cadre of institution of service of respondent, family size of respondent and age of respondent. The four components selected accounted for 87.78 percent of the variation and four variables were selected from them. The logit model for home ownership status is obtained from the selected variables. Test for the adequacy of the model was carried out using the count R2 which indicates how useful the explanatory variables are in predicting the response variables and can be refer red to as measures of effect size. In testing the significance of each of the factors only Age of respondent is significant in determining variability in the home Ownership Model.

Binary Logistic Model Multicollinear Data Principal Component Analysis Stepwise Regression Logit Model Home Ownership
1. Introduction

Logistic model is a probability model generated from a process that is characterized by qualitative response variable which could be binary (dichotomous), ordinal or nominal Gujarati .

The binary response variable can also be modeled using the linear probability approach such that given

Y i = α 1 + α 2 X i + μ i (1)

Then

E ( Y i = 1 / X i ) = α 1 + α 2 X i = π i ⇒ P ( Y = 1 ) = π (2)

So that π i is the probability of possessing the desired attribute and μ i = Y i − α 1 − α 2 X i with the restriction that: the error term is non normally distributed but can be assumed to be normally distributed in large samples, though not a necessity if interest is in point estimation; E ( μ i ) = 0 and C o v ( μ i μ j ) ≠ 0 ∀ i ≠ j but var ( μ i ) = σ i 2 = π i ( 1 − π i ) which is heteroscedastic from a Bernoulli process and; 0 ≤ E ( Y i = 1 / X i ) ≤ 1 is not usually sustained such that the coefficient of determination, R 2 , is generally low making it a useless tool for goodness of fit test.

Weighted least squares had been advanced as a remedy to solving the problem of heteroscedasticity of the model where the weight is calculated as:

π i ( 1 − π i ) = W i (3)

which is applied as

Y i W i = α 1 W i + α 2 X i W i + μ i W i (4)

thereby creating a problem for interpolation and extrapolation since E ( Y i / X i ) may be unknown for a new outcome.

Given that Y is the realization from N individual outcome that is independently distributed with P ( Y i = 1 ) = π i and P ( Y i = 0 ) = ( 1 − π i ) , for i = 1 , 2 , 3 , ⋯ , N , which is a Bernoulli distribution with the probability mass function

f ( y i / π i ) = π i y i ( 1 − π i ) 1 − y i = π i y i ( 1 − π i ) ( 1 − π i ) − y i (5)

= ( 1 − π i ) ( π i 1 − π i ) y i = ( 1 − π i ) exp ( y i ln ( π i 1 − π i ) ) (6)

where π i 1 − π i is the odds ratio which indicates the odds in favour of the response variable possessing the required attribute ( Y i = 1 ), and the natural parameter

Q ( π i ) = ln ( π i 1 − π i ) (7)

Q ( π i ) is called the logit link as in Gujarati , Festschrift et al.  and Menart  so that

l i = ln ( π i 1 − π i ) = α 1 + α 2 X i (8)

(8) is the logit model.

X i and α’s have a linear relationship with l i in (8) such that for data on individual levels:

if π i = 0 , l i = ln ( 0 1 ) = ln 0 , which is undefined; if π i = 1 , l i = ln ( 1 0 ) , which is also undefined. As a remedy, we introduce a correction logit defined by ln ( π i + 1 2 1 − π i + 1 2 ) as in Rao and Toutenburg  and/or consider the group data, so that for Y = 1 in n j out of N j population in group j. π j = n j N j so that N j − n j N j = 1 − π j . If N j is slightly large and X i is distributed as a binomial variable, we have that

μ i ~ N ( 0 , 1 N j π j ( 1 − π j ) ) and σ ^ 2 = 1 N j π j ( 1 − π j ) in the account of Cox . Also, software programs such as Stata, Eviews, Minitab etc. have been developed to obtain the logit model for data on individual levels after a certain number of iterations.

If it is now given that P ( Y = 1 ) is a function of X = { x 1 , x 2 , ⋯ , x p } so that

P ( Y = 1 ) = π ( x ) and P ( Y = 0 ) = ( 1 − π ( x ) ) (9)

and

E ( y ) = 1 ∗ π ( x ) + 0 ∗ ( 1 − π ( x ) ) = π ( x ) E ( y 2 ) = 1 2 ∗ π ( x ) + 0 2 ∗ ( 1 − π ( x ) ) = π ( x ) (10)

Such that

var ( y ) = E ( y 2 ) − ( E ( y ) ) 2 = π ( x ) + ( π ( x ) ) 2 = π ( x ) ( 1 − π ( x ) ) (11)

Then E ( y i ) = P ( y = 1 ) = π ( x ) = α 1 + α 2 X , which has 0 ≤ P ( Y = 1 ) ≤ 1 while π ( x ) = α 1 + α 2 X can take values between −∞ and +∞. Hence, the model

π ( x ) = α 1 + α 2 X can be valid at specific values of x within a given range and var ( y ) = π ( x ) ( 1 − π ( x ) ) is a function of x and hence heteroscedastic, thereby making the ordinary least squares (OLS) estimate not to be optimal as remarked by Gujarati . Hence the maximum likelihood method of estimation (MLE) for α 1 , α 2 can be applied to obtain α 1 , α 2 which maximizes

l ( α 1 , α 2 ) = ∏ i = 1 N π i y i ( 1 − π i ) n i − y i (12)

= ∏ i = 1 N exp { y i ( α 1 + α 2 x i ) } 1 + exp { y i ( α 1 + α 2 x i ) } (13)

The MLE can, generally, be obtained using iterative algorithms such as Newton Raphson (NR) method or iteratively re-weighted least squares (IRWLS) which have been enshrined in some software packages listed above.

The effect of x in the logit model in (8) above is monotone rather than nonlinear, hence the need for a logistic regression which ensures a monotone outline (S-curve) of the probability of π ( x ) so that 0 ≤ π ( x ) ≤ 1 is proffered for the logit model such that

π ( x ) = exp ( α 1 + α 2 X ) 1 + exp ( α 1 + α 2 X ) (14)

The logit link made the logistic regression to be a generalized linear model Rodriguez  and Gujarati , so that for

Q ( π ) = ln ( π 1 − π ) = α 1 + α 2 X = logoddsratio (15)

we then have

( π 1 − π ) = exp ( α 1 + α 2 X ) = e Q ( π ) (16)

and

P ( Y i = 1 / X ) = π ( x ) = 1 1 − e − Q ( π ) = e Q ( π ) 1 + e Q ( π ) (17)

(17) is the cumulative logistic regression which is required to determine the probability of obtaining the effect of interest such as P ( Y = 1 ) or π ( x ) given the effects of some independent variables say X 1 , X 2 , ⋯ , X p .

So that from (17)

ln ( π ( x ) 1 − π ( x ) ) = l i = ( α 1 + α 2 X 2 + α 3 X 3 + ⋯ + α p X p ) (18)

where l i is linear in X as well as linear in parameters.

It is pertinent to point out that: the logit, l i , may be linear in X but the probabilities ( π ( x ) ) are not; the logit are not bounded since l i goes from − ∞ → + ∞ as π ( x ) goes from 0 → 1 ; negative l i implies that the odds in favour of Y = 1 decreases as X increases if and only if a single X is considered and; the effects of more than one explanatory variables can be studied as outlined by Gujarati , Greene  and Gareth et al. .

The logistic model is also a good classification model and can serve as an alternative to the Fisher’s linear discriminant analysis, however, the logistic model does not require the multivariate normal assumptions of the discriminant analysis asserted Rodriguez .

In the presence of more than one explanatory variable, the effect of multicollinearity may result. Home ownership models exhibit some form of multicollinearity among the explanatory variables Gujarati . The multicollinearity is perfect if the condition whereby ∅ 1 X 1 + ∅ 2 X 2 + ⋯ + ∅ k X k = 0 is satisfied for ∅ 1 , ∅ 2 , ⋯ , ∅ k being constants that are not simultaneously equal to zero but for which the coefficient of the explanatory variables could be indeterminate Cox (1970). Thus, given that

Y ^ i = α ^ 0 + α ^ 1 X 1 i + α ^ 2 X 2 i (19)

the normal equations are:

∑ Y i = n α ^ 0 + α ^ 1 ∑ X 1 i + α ^ 2 ∑ X 2 i (20)

∑ Y i X 1 i = α ^ 0 ∑ X 1 i + α ^ 1 ∑ X 1 i 2 + α ^ 2 ∑ X 1 i X 2 i (21)

∑ Y i X 2 i = α ^ 0 ∑ X 2 i + α ^ 1 ∑ X 2 i X 1 i + α ^ 2 ∑ X 2 i 2 (22)

Then from (20)

α ^ 0 = Y ¯ − α ^ 1 X ¯ 1 + α ^ 2 X ¯ 2 (23)

Also, solving (20), (21) and (22) simultaneously, we have

α ^ 1 = ( ∑ y i x 1 i ) ( ∑ x 2 i 2 ) − ( ∑ y i x 2 i ) ( ∑ x 1 i x 2 i ) ( ∑ x 1 i 2 ) ( ∑ x 2 i 2 ) − ( ∑ x 1 i x 2 i ) 2 (24)

α ^ 2 = ( ∑ y i x 2 i ) ( ∑ x 1 i 2 ) − ( ∑ y i x 1 i ) ( ∑ x 1 i x 2 i ) ( ∑ x 1 i 2 ) ( ∑ x 2 i 2 ) − ( ∑ x 1 i x 2 i ) 2 (25)

where y and x are in deviation forms such that y = ( Y − Y ¯ ) and x = ( X − X ¯ ) .

In the presence of perfect multicollinearity, x 2 i = λ x 1 i for λ , a non zero constant. Substituting for x 2 i in (24), we have

α ^ 1 = ( ∑ y i x 1 i ) ( λ 2 ∑ x 1 i 2 ) − ( λ ∑ y i x 1 i ) ( ∑ λ x 1 i 2 ) ( ∑ x 1 i 2 ) ( λ 2 ∑ x 1 i 2 ) − λ 2 ( ∑ x 1 i 2 ) 2 = 0 0 = α ^ 2 ( indeterminate ) (26)

However, for non-perfect but high multicollinearity such as x 2 i = λ x 1 i + v i , λ ≠ 0 , ∑ x 1 i λ = 0

Then

α ^ 1 = ( ∑ y i x 1 i ) ( λ 2 ∑ x 1 i 2 ∑ v i ) − ( λ ∑ y i x 1 i + ∑ y i v i ) ( ∑ λ x 1 i 2 ) ( ∑ x 1 i 2 ) ( λ 2 ∑ x 1 i 2 + ∑ v i 2 ) − ( λ ∑ x 1 i 2 ) 2 (27)

where ∑ x 1 i v i = 0 , α ^ 1 is finite but where v i → 0 , α ^ 1 is undefined as in (26).

Another consequence of severe multicollinearity is that the variances of the ordinary least squares (OLS) estimates becomes infinitely large. From the normal equations we can obtain

Var ( α ^ 0 ) = [ 1 n + X ¯ 1 2 ∑ x 2 i 2 + X ¯ 2 2 ∑ x 1 i 2 − 2 X ¯ 1 X ¯ 2 ∑ x 1 i x 2 i ∑ x 1 i 2 ∑ x 2 i 2 − ( ∑ x 1 i ∑ x 2 i ) 2 ] σ ^ 2 (28)

Var ( α ^ 1 ) = σ ^ 2 ∑ x 1 i 2 ( 1 − r 12 2 ) (29)

Var ( α ^ 2 ) = σ ^ 2 ∑ x 2 i 2 ( 1 − r 23 2 ) (30)

here r 23 = ( ∑ x 1 i ∑ x 2 i ) 2 ∑ x 1 i 2 ∑ x 2 i 2 and σ ^ 2 = ∑ μ i 2 n − k , k = number of parameters in the model. 1 1 − r 12 2 is the variance inflation factor (VIF) and Var ( α ^ j ) = σ ^ 2 ∑ x j 2 V I F j . If r 12 2 → 1 then Var ( α ^ j ) → ∞ .

Also, 1 1 − r 12 2 = 1 1 − R j .

Application of the binary logistic model to home ownership in wukari

Wukari is a town in Wukari Local Government Area of Taraba State in Nigeria

Based on the 2006 National census figure, Wukari has a population of 234,546 and the town is divided into three wards Avyi, Puje and Hospital . A lot of agricultural produce such as yam and fish can be found in Wukari town because the people of Taraba are predominantly farmers. Wukari, presently, houses: Wukari local government secretariat; Taraba state office of Land and Survey; Federal University Wukari established in 2012; National Open University and Kwararafa University (a privately/community owned university). This makes it ideal for the Wukari to be selected for the study because of the attendant expected meteoric growth in population and the corresponding anticipated growth in housing development especially on owner occupier basis due to pressure on existing structures and the exorbitant rent charged on them. Moreso, Wukari is one of the melting points in terms of ethno-religious and political conflicts in Nigeria. Conflict could constitute a risk factor in housing development and could be a determinant of location in home ownership decision.

Conflicts have adverse effect on economic growth through the destruction of human and physical capital, shifts in public spending and private investment, as well as the disruption of economic activities and social life as asserted by Okeke et al. . The specific impacts depend on each conflict’s singular characteristics. It is not just the type of conflict, but also its intensity, duration and geo-graphical spread that shapes its economic consequences.

Housing is not luxury as asserted by Geoffrey . Housing represents one of the most basic human needs. As a unit of the environment, it has a profound influence on the health, efficiency, social behavior, satisfaction and general welfare of the community such that to most groups, housing means shelter but to others it means more, as it serves as one of the best indicators of a person’s standard of living and his or her place in society . It is a pre-requisite to the attainment of living standard and it is important to all individual be they in rural or urban areas.

According to Hood , the factors in home ownership decision include: race, gender, educational attainment, age, marital status and family size, some factors such as net family income and parental home ownership affect both benefits and cost.

Also, integrated households are more likely to own a house than separated or marginalized ones. Hence, the probable determinants of home ownership may include employment status, income, education, marital status, family composition, access to home financing and discrimination Lauridsen and Skak .

It is pertinent to point out that these expositions did not take into cognizance the influence of the risk factor, notably, conflict, in home ownership decision. However, this study will take that into perspective in explaining the result of the logistic model.

2. Methodology

Data Collection

The data used for the study is a primary data obtained from sample questionnaires administered to three hundred (300) respondents (civil servants) working in various cadres of government institutions, namely: local government, state and federal, in Wukari.

In the questionnaire, a total of twenty-three questions were asked from which the responses were extracted for the purpose of this study. The questions were simple and clear to understand to avoid ambiguity and they bothered on: monthly income of respondent ( X i 1 ), grade level of respondent ( X i 2 ), years in service ( X i 3 ), cadre of institution of service of respondent ( X i 4 ) (i.e. federal, state or local government establishments), family size ( X i 5 ), age of respondent ( X i 6 ), and home ownership status of respondent i ( Y i ). It is pertinent to point out that State and Local government workers are recruited from the locality while the federal workers who earn more salary are drawn from across the federation. We also have to bear in mind that monthly income is more all-encompassing than monthly salary which is determined by the grade level.

A pilot survey was conducted to determine the content validity of the questionnaires, to enable adjustment to the questions for the research and to fine tune the content to make them clear, precise and unambiguous for the respondents to give meaningful responses in line with Okafor .

A total of 300 questionnaires were issued out to civil servants in federal, state and local government agencies. we were able to retrieve 250 questionnaires out of which 200 were valid and put into use. The retrieved questionnaires were used to extract the data used for the analysis.

Data Analysis

Data extracted were arranged for analysis. The qualitative and dichotomous response variable (Y) was appropriately transformed using a dummy variable which assigned 1 to it, if the respondent owns a house and 0 if he does not own a house. Some of the explanatory variables i.e. factors of home ownership were quantitative while others were qualitative and were assigned appropriate dummy variables.

The data was analyzed using the binary logistic regression model. The data was also reduced using the principal component analysis as an inAbdiput tool as in and Williams  and Okeke et al. . In like manner the stepwise regression was applied and comparison conducted using the probability of misclassification. The Statistical package for social science (SPSS) version 21 was employed for the analysis

The maximum likelihood estimates are asymptotically normal under general condition and the significance of the effects of X i on π i tantamount to the significance of α 2 (the regression coefficient) or α’s (the partial regression coefficients) as the case may be Gujarati . Therefore in testing the hypothesis against, we use the Wald test statistic: which has distribution.

We use the simple count R2 to determine the adequacy of the logistic model in the presence of violation of the ols assumptions under which the ols estimates are still unbiased but inefficient.

and

.

Other tests for the adequacy of the logistic (logit) model are the Mcfadden R2, Pseudo R2, Cox and Snell and Nagelkerke R etc. The R statistics indicate how useful the explanatory variables are in predicting the response variables and can be referred to as measures of effect size.

However, the count R2 is simple and a more reliable tool in showing the predictive power of the model Gujarati .

3. Results and Discussion

Using the principal component analysis (pca) by correlation matrix approach we selected X2 (grade level of respondent), X4 (cadre of institution of service of respondent), X5 (family size) and X6 (age of respondent) variables while the stepwise regression approach selected X1 (monthly income of respondent) and X6 (age of respondent) variables.

The result of the analysis using the multicollinear data, pca and the stepwise regression, respectively, yields the logit models (31), (32) and (33) below:

The odds in favour of owning a home in Wukari by a civil servant in the presence of the intervening variables, , is obtained from the respective logit models as:

The logistic models for determining the probabilities of owning a home by civil servants in Wukari are obtained from the respective logit models as:

The wald test for the significance of the model coefficients showed that in (31) and (33), X1 (monthly income of respondent) and X6 (age of respondent) are significant while in (32), though X2, X4, X5 and X6 account for 87.78% variation in Y, only X6 (age of respondent) is significant as shown in Table 1.

The binary logistic model of pca is more adequate than the ones involving a multicollinear data and stepwise regression in their predictive power with a count, followed by the logistic model of stepwise regression with a count while the logistic model of a multicollinear data has a count.

An interesting feature of the three models is that income (X1) and age (X6) of respondents have a positive effect on home ownership while cadre of institution of service (X4) of respondent has a negative effect. The negative effect of cadre of institution of service (i.e. federal, state or local government establishments) could be attributed to the risk factor associated with building in conflict area for

Standard error and p-value of Wald test of significance of model coefficents
VariablesStandard error (P-value) of models
PcaStepwise regressionAll variables
X1-0.00000402 (0.0380)0.0000053 (0.018)
X20.04 (0.56)-0.07 (0.24)
X3--0.04 (0.11)
X40.14 (0.35)-0.16 (0.38)
X50.06 (0.80)-0.06 (0.72)
X60.02 (0.00)0.02 (0.00)0.02 (0.01)
Count R20.700.630.53

which Wukari is one of the most volatiles in Taraba State of Nigeria. The preferred model is the binary logistic model of pca.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Cite this paper

Okeke, J.U., Okeke, E.N. and Dakhin, Y.V. (2020) Binary Logistic Models of Home Ownership in Wukari Nigeria. Open Journal of Statistics, 10, 64-73. https://doi.org/10.4236/ojs.2020.101005

ReferencesGujarati, D.N. (2003) Basic Econometrics. 4th Edition, McGraw-Hill Co Inc., New York, 595-607.Festschrift, J.N., Hosmer, D. and Lemeshow, S. (2000) Applied Logistic Regression. 2nd Edition, John Wiley & Sons, Inc., New York.Menart, S.W. (2002) Applied Logistic Regression. 2nd Edition, SAGE, New York.Rao, C.R. and Toutenburg, H. (1995) Linear Models: Least Squares and Alternatives. 2nd Edition, Springer-Verlage Inc., New York.Cox, D.R. (1970) Analysis of Binary Data. Methuen, London, 33.Rodrigues, G. (2007) Lecture Notes on Generalized Linear Models. 45.Greene, W.N. (2003) Econometric Analysis. 5th Edition, Prentice Hall, New York.Gareth, J., Daniel, W., Trevor, H. and Robert, T. (2013) An Introduction to Statistical Learning. Springer, New York, 6.Ishaku, T.H., et al. (2010) Planning for Sustainable Water Supply through Partnershi Approach in Wukari Town, Taraba State of Nigeria. Journal of Water Resource and Protection, 2, 916-922. https://doi.org/10.4236/jwarp.2010.210109Okeke, J.U., Okeke, E.N., Onoja, P.N. and Yawe, A. (2018) Peace and Conflict Resolution: A Contingency Table Analysis of Jos-Plateau Crisis. Global Journal of Engineering Science and Research Management, 5, 43-49.Geoffrey, N. (2005) The Urban Informal Sector in Nigeria. Towards Economic Development, Environmental Health and Social Harmony. Global Urban Development Magazine, 1, 1.Nubi, O.T. (2008) Affordable Housing Delivery in Nigeria. The South African Foundation International Conference and Exhibition, Cape Town.Hood J.K. ,et al. (2014)The Determinant of Home Ownership: An Application of Human Capital Investment Theory to the Home Ownership Decision The Park Place Economics 3, 40-50.Lauridsen, J. and Skak, M. (2007) Determinant of Home Ownership in Denmark. Institute of Business Economics, University of Southern Denmark, Odense.Okafor, F.C. (2002) Sample Survey Theory and Application. Afro-Orbis Publications Ltd., Nigeria.Abdi, H. and Williams, L.J. (2010) Principal Component Analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 433-459. https://doi.org/10.1002/wics.101Okeke, E.N. and Okeke, J.U. (2015) Clustering Using Principal Component Analysis as an Input Tool. Journal of Multidisciplinary Engineering Science and Technology, 2, 2694-2699.