Classification of Stateless People through a Robust Nonparametric Kernel Discriminant Function ()
1. Introduction
Nationality acts as the linkage between a citizen and the international system through domestic laws. Nationality, traces its roots to the history of human race with human beings having a sense of belonging to a nation/country and hence the nationality to which an individual belongs guarantee him rights to citizen rights. Although, every person can have the right to nationality the same has not been experienced by every individual in the world. This has created a situation that has led to some individuals being stateless in their host country [1].
A stateless person is someone who, under National Laws, does not enjoy Citizenship—the legal bond between a government and an individual—in any country. Statelessness is a global anomaly and many persons who are stateless have never crossed an international border [2] [3]. Two United Nations Conventions established the international legal framework for the protection of stateless persons and the prevention and reduction of statelessness. The 1954 Convention Relating to the Status of Stateless Persons gives the definition of stateless persons and also provides important minimum standards of treatment for stateless persons. It defines a stateless person as a person who is not considered as a national by any state under the operation of its law. The 1961 Convention on the Reduction of Statelessness sets out guidelines for the prevention of statelessness.
Kenya has a few groups who remain in protracted statelessness situations. These include the Pemba, the Shona, people of Burundi and Rwanda descent, and children born in Kenya to British Overseas Citizens after 1983, [4]. These persons are not only undocumented but also often overlooked and not included in national administrative registers and databases. Many stateless persons and persons of undetermined nationality are counted in the defacto population and housing censuses but often go unrecognized by nationality or ethnic affiliation.
Although the number of stateless persons in Kenya is unclear, after the registration of the Makonde, an estimate of 18,500 stateless persons in Kenya is being used, [5]. Despite various amendments to provisions providing for the right to a nationality, many of Kenya’s domestic laws on nationality are discriminatory and infringe greatly on the fundamental human rights, hence potentially resulting in an increase in the number of people that become stateless or those who are stateless remain that way indefinitely. Kenya has to date not ratified the 1954 Convention relating to the Status of Stateless Persons and the 1961 Convention on the Reduction of Statelessness. Nevertheless, the discriminatory nationality laws and the administration thereof have repeatedly been brought to the attention of the international human rights community. The grounds thereof are based on Kenya’s national laws being inconsistent with Kenya’s international human rights obligations. In order to make an adequate assessment of Kenya’s national laws, it should be noted that the causes of statelessness in Kenya can be divided into two broad categories, namely, administrative and legal, which illustrates the gap between law and practice.
The administrative causes of statelessness in Kenya such as the faulty operation or under-regulated nature of Kenya’s administrative practices concerning citizenship puts individuals, especially children, at risk of becoming stateless, [6]. This is due to the fact that there are no adequate regulations that guide the vetting process that certain ethnic groups in Kenya are subjected to. This includes registration offices retaining discretion to request from individuals’ documentary proof before issuing documents, including birth certificate and various additional documentations which require repeated trips to various government buildings causing additional travel costs and a prolonged intimidating process.
In Kenya, the known groups of stateless are the Galjeel, Pare and Pemba, [7]. This was the case for stateless persons and persons of undetermined nationality during the 2009 population and housing census of Kenya. This census did not specifically categorize resident persons of unknown nationality in Kenya at that time, hence the stateless population was not clear. However, some studies by the United Nation High Commission for Refugees estimates the stateless population in Kenya as ranging from 18,500 to 20,000 in Kenya, [3].
Despite the attempts to improve the coverage for stateless persons in the 2019 Census, getting the specific groups remained a mirage because the codes or options did not provide for the finer details. Further, it established a much smaller population of 6272. Many of these groups would hide their identities for fear of imagined victimization. Broadly speaking, the Global Action Plan includes actions to resolve existing situations of statelessness; present new cases of statelessness from emerging and better identify and protect stateless persons. The Global Plan to End Statelessness in 10 years requires all states to improve quantitative and qualitative data on stateless populations. The goal specifically requires that quantitative data on stateless populations are publicly available for 150 States and that qualitative analysis of this group is publicly available for at least 120 States, [6].
This study focuses on the stateless persons in Kenya and narrows them down to the Pemba community who is estimated to have a population of about 4000 in Kenya, [8]. It therefore looks into how the Pemba community can be integrated into some of the local communities. Thus, it is important to fully understand the characteristics of the Pemba community and find out if there are any similarities against the surrounding communities using attributes generated from the 2009 Kenya Population and Housing Census with the aim of seeing which local community fits best if they are to be absorbed. To achieve this, a nonparametric kernel discrimination function is developed and used for the classification of the Pemba community into the neighboring local communities. The Characteristics/auxiliary information considered here includes education level and employment status. To determine whether the Pemba community is correctly classified in a particular community, miss-classification rates are computed and compared with other existing classification models.
2. Review of Classification through Discriminant Analysis
Application of discriminant analysis has gained interest in various fields of social science, economics, education, finance and engineering. For instance, In routine banking or commercial finance, an officer or analyst may wish to classify loan applicants as low or high credit risks on the basis of the elements of certain accounting statements, [2]. According to [9], he viewed the problem of discriminant analysis as that of assigning an unknown observation to a group with a low error rate. The function or functions used for the assignment may be identical to those used in the multivariate analysis of variance. Also [10], defined discriminant analysis and classification as multivariate techniques concerned with separating distinct sets of objects or observations, and with allocating new objects (observations) to previously defined groups. For instance, in the case of personnel selection the acceptance or rejection of an applicant is frequently based on a number of test scores obtained by the applicant. In all this problems it is assumed that there are two populations, say P1 and P2, one representing the population of individuals fit, and the other the population of individuals unfit for the purpose under consideration. The problem is that of classifying an individual into one of the populations P1 and P2 on the basis of his test scores. Usually, some statistical data from past experience are available which can be utilized in making the classification.
There is a lot of literature where researchers have discussed classification problems extensively and its applications. For instance, discriminant analysis has been applied in classification of students on the basis of their academic performance, [11]. In their research, they used the cumulative results of PRE-ND students of Accountancy and Business Administration department based on the five courses they offered for 2004/2005 academic session. Based on their scores, 78 students were discriminated from Business Administration to Accountancy, and 37 students from Accountancy to Business Administration. In the field of risk analysis, [12] applied discriminant analysis to identify students who might be “At risk” (AR) and “Not At Risk” (NAR). The first group, are students who are in danger of graduating with a poor class of degree, and the second group are those that will graduate with better class of degree within their first two years of study. His analysis successfully classified or predicted 87.5 percent of the graduating students’ class of degree. In the education sector, [13] applied discriminant analysis to compare the performance of students who gained admission into the university system through pre-degree programme and those who passed through the University Matriculation Examination, (UME). It was observed that there is no difference in the performance of UME and predegree students on the average at 5% level of significance.
The gap in the literature cited here is that the researchers relied upon the parametric discriminant methods in the classification problems. Although these methods are conceptually simple and has been used in many application areas, their reliability on the normality assumption limits their performance and application. Furthermore, they are not capable of capturing nonlinearly clustered structures in the data. There is no or little literature that discusses the application of classification in solving the stateless problem that exists globally.
To minimize the failures of the parametric techniques discussed above, this study develops a Robust Nonparametric Kernel discriminant function that will be a better choice whenever a non-linear classification model is needed. This is because Non-parametric estimators are more robust and are useful especially when there exists auxiliary information on finite population parameters which is often used to increase precision of estimators of the parameters, [14].
3. Discrimination and Classification
Consider a set of v populations or groups that correspond to density functions
. Also consider assigning all the points x from the sample space to one of these groups or densities. The weighted heights of the density functions is used to obtain the Bayes discriminant rule
(1)
where
is the prior probability of drawing from density
. Enumerating for all x from the sample space, a partition
of the sample space is produced using
The discriminant rule, Equation (1), contains the unknown density functions and the (possibly) unknown prior probabilities. When data is collected, this abstract rule can be modified into a practical one.
The training data
, is collected which is drawn from
, for
. (The sample sizes
are known and non-random).
A priori there is a class structure in the population since it’s known which data points are drawn from which density function. From these training data, a practical discriminant rule and subsequent partition can be developed.
Using this discriminant rule/partition, the test data
, drawn from
can be classified.
It’s not clear here which populations generated which data points.
The usual approach (and the one used in the above example) is to estimate these density functions (and prior probabilities if needed) and substitute into the discriminant rule. Parametric approaches that are well-known and widely used are linear and quadratic discriminant techniques. However these suffer from the restrictive assumption of normality. With non-parametric discriminant analysis, this assumption can be relaxed and thus be able to tackle more complex cases. The study will focus on kernel methods for discriminant analysis. The monographs [15] [16] [17] (Chapter 7) contain summaries of kernel discriminant analysis while [18] contains more detailed and lengthy expositions on this subject.
3.1. Classification of Stateless Persons through Kernel Discriminant Function
Kernel density estimation, [15] [19] is a popular method for nonparametric density estimation, and it has one well known application in kernel discriminant analysis (KDA), [20]. Consider a J class classification problem, if there exist have a training sample
of n observations, the kernel estimate for the density function
can be expressed as
(2)
where
is the number of observations from the jth class
K is a d-dimensional density function symmetric around 0, and h is the associated smoothing parameter known as the bandwidth. These kernel density estimates are then used to used to construct the proposed kernel discriminant rule (KDR) the proposed classification rule for the stateless persons given by
(3)
where
is the kernel density estimate corresponding to the jth group and where
is the prior probability of the jth group. If these priors are not known, one usually estimates them using training sample proportions
of different groups. Many choices for the kernel function
K are available in the literature, [15] [19]. Since the kernel density estimators for discriminant analysis is being used, selection of appropriate bandwidths becomes crucial. One can attempt to find optimal bandwidths for optimal individual kernel density estimates on one hand, while on the other hand, optimal bandwidths which directly optimise the misclassification rate (MR), as [20] attempt for the two can be found.
3.2. Misclassification Rate (MR)
This rate is the proportion of points that are assigned to an incorrect group based on a discriminant rule. Then we have
(4)
where
is expectation with respect to Y or
, and
is expectation with respect to
or
.
True positive (TP): Observation is predicted positive and is actually positive.
False positive (FP): Observation is predicted positive and is actually negative.
True negative (TN): Observation is predicted negative and is actually negative.
False negative (FN): Observation is predicted negative and is actually positive.
[18] recommends the former approach for three reasons. First, accurate estimates of the individual density functions are useful in their own right; second, accurate density estimates can be used in other, more complex discriminant problems which look at measures other than the misclassification rate; and third, direct optimisation with respect to a misclassification rate poses many difficult mathematical obstacles.
Whilst we will not use the misclassification rate to select bandwidths, we will still use it as our performance measure of a discriminant rule. So we need to estimate it. The most appropriate estimate depends on whether we have test data or not. If we do, as is the usual case for simulated data, then a simple estimate is obtained by counting the number of
that is assigned to an incorrect group, divided by the total number of data points m. On the other hand, if we do not have test data, as is the usual case for real data, then we use the cross validation estimate of MR, as recommended by [15] [18]. This involves leaving out each
, constructing a corresponding leave-one-out density estimate and subsequent discriminant rule. We then compare the label assigned to
based on the leave-one-out discriminant rule to its correct group label. These counts are then summed and divided by n.
3.3. Algorithm for Kernel Discriminant Classification Rule
The algorithm for the proposed kernel discriminant analysis is given below. The algorithms for linear and quadratic discriminant analysis are similar except that any kernel methods are replaced by the appropriate parametric methods. We put these algorithms into practice with both simulated and real data.
1) For each training sample
, compute a kernel density estimate
(5)
We can use any sensible bandwidth selector
2) If prior probabilities are available, then use them. Otherwise estimate them using the training sample proportions
.
3)
a) Allocate test data points
according to KDR/Equation (3) or,
b) Allocate all points x from the sample space according to KDR/Equation (3).
4)
a) If we have test data then the estimate of the misclassification rate is
(6)
b) If we do not have test data the cross validation estimate of the misclassification rate is
(7)
where
is similar to KDR except that
and
are replaced by their leave one out estimates obtained by removing
that is
and
(8)
That is, we repeat step 3 to classify all
using
.
4. Emperical Study
For the real data, we are using data from Kenya National Bureau of Statistics obtained from the 2009 Census. The data consist of tribes living in the coastal region of Kenya especially the Kilifi county and various characteristics associated to them such as Education level Religion, Building material, waste disposal, source of water and employment status. The study aims to classify these communities using the characteristics observed amongst them and obtain the misclassification error which is the error that the community is classified in the wrong group. In addition, the study aims at using this information to classify the Pemba community which has been stateless for long time and use this information to advice the policy makers to consider integrating the Pemba people into the identified community/s. This will help to inform on the classification decision on any emerging tribe in the coastal region whose is not known but possess similar characteristics. Due to the challenges of insufficient data in the database on the Pemba Community, the only data available for use is based on the characteristics such as level of education and employment. We apply non parametric discriminant analysers and compare their performance with the parametric methods.
As shown in Figure 1, there are about 10 communities in Kilifi County with a population of about 1.02 million people which are neighboring the Pemba community which has an estimated population of over 2000 people and has been stateless for a long time since Kenya got Independent. Although some Pemba
Figure 1. Distributions of the tribes neighboring the pemba community in Kilifi County.
were issued with IDs in Kenya, most of the IDs were withdrawn or not renewed with the change in administration and legislation. After their identity documents were withdrawn in the 1980s and late 1990s, many Pemba were asked to leave the country but they would spend days hiding in the bushes until the situation seem calm enough for them to return. This community, who are mainly fishermen by trade, cannot obtain a fishing license and have no access to relief food during emergencies and they cannot take even enjoy of banking services.
To analyse this data and perform a classification, a sample of 3000 observation was taken using Stratified simple random sampling technique where the tribes were treated as the stratas. The proportional allocation technique was used to obtain a sample from each tribe to ensure equal representation in the each tribe in the study. The sample data was then divided in two parts which 66% being used to train various classifiers used to perform the classification and 34% used for the testing and classification of the various communities into specific tribes.
A comparison is conducted by examining the performance of the following discriminant analysers:
1) Linear discriminant (LD).
2) Quadratic discriminant (QD).
3) Kernel discrinant with 2-stage AMSE diagonal bandwidth matrices (KDD2).
4) Kernel discrinant with 2-stage SAMSE full bandwidth matrices (KDS2).
5) Kernel discrinant with 1-stage SCV full bandwidth matrices (KDSC).
The R code for kernel discriminant analysers is based on the bandwidth matrix selection and density functions in the ks library. The R code for LDA and QDA are supplied within the MASS library in the R software by the function lda() and qda() respectively.
4.1. Misclassification Rates for Stateless Communities in Kenya
In the first analysis, we use the training data to train the model and use the same training data as a test data to see how the model performs.
The misclassification rate rates within the groups are given in Table 1 and Table 2. From these results it can be observed that, Kernel discriminant anaysers are more efficient than parametric ones.
In the second analysis, we use the training data to train the model and an independent data as a test data to see how the model performs. From the results in Table 3, the cross validation misclassification rates for the kernel discriminants are KDD2: 0.5375, KDS2: 0.4875 and KDSC: 0.5689. For the parametric discriminants, they are LD: 0.7625 and QD: 0.7000. It can be observed that the kernel methods, with appropriately chosen bandwidth matrices, outperform the parametric methods; and that the kernel methods with full bandwidth matrices outperform those with diagonal bandwidth matrices.
In some instances accuracy or misclassification error can be misleading if used with imbalanced datasets, and therefore there are other performance metrics based on confusion matrix which can be useful for evaluating performance. These performance measures include Sensitivity, Specificity, Precision, Recalls and F1. Precision or the positive predictive value, is the fraction of positive values out of the total predicted positive instances. In other words, precision is the proportion of positive values that were correctly identified; Sensitivity, recall, or the TP rate (TPR) is the fraction of positive values out of the total actual positive
Table 1. Misclassification rates for various discrimant analyser using training data as a test data.
Table 2. Misclassification rates for each group for various discrimant analyser using training data as a test data.
Table 3. Misclassification rates for various discrimant analyser using independent test data.
instances (i.e. the proportion of actual positive cases that are correctly identified, while Specificity gives the fraction of negative values out of the total actual negative instances. In other words, it is the proportion of actual negative cases that are correctly identified. The FP rate is given by (1 − specificity). The F1 score, F score, or F measure is the harmonic mean of precision and sensitivity it gives importance to both factors. Table 4 gives these performance measures for each tribe based on different classifiers. It can be observed that the Kernel discriminant classifiers outperform parametric classifiers when the appropriate bandwidth matrix is chosen as they show high values of precision, sensitivity, specificity and F1 across the tribes.
4.2. Classification of the Stateless Pemba Community
The main objective of this study was find which neighboring local community in Kilifi County that the Pemba community which have lived for long in a stateless nature can be integrated into so that they can be recognized as Kenyans and be issued with the National Identification Number so that they can be able to access
Table 4. Classification performance of four classification models based on the data on the stateless communities.
Services from the Government just like any other Kenyans without discrimination. Such activities include access to some basic rights and services such as acquisition of birth certificates, education, formal employment, financial services, for example, opening a bank account, in some cases health care, health insurance services, and to play in sports at national and international levels. The neighboring local communities the study is seeking to integrate Pemba Community into includes the Bajuni, Boni, Digo, Duruma, Giriama, Jibana, Kambe, Rabai, Ribe and Waata community living in Kilifi county where majority of the Pemba community are found.
The results in Table 5, Table 7, Table 9 and Table 11 present the confusion matrix for the classification of the communities using the Kernel discriminant classifier KDD2, KDSC and the Quadratic and Linear discriminant classifier respectively. From these results, the KDD2 classifier apart from truly classifying Pemba community as Pemba, it also classified them into other tribes with 29 people being classified as Giriama, 87 as Pemba and 21 people as Rabai. The KDSC classifier classified 29 people as Giriama, 88 as Pemba and 20 as Rabai. The QDA classifier classified majority 20 as Giriama, 79 as Pemba and 20 as Rabai and the LDA classifier classified the Pemba people with 22 being classified as Giriama, 76 as true Pemba and 19 as the Rabai. From this finding it can be observed that, based on certain similarities that exists in this communities, the Pemba community can be classified as Giriama because they seem to have a strong link with them. The next community that they can be classified as is the Rabai community (Tables 5-12).
Table 5. The confusion matrix of the communities in Kilifi County classified based on KDD2 classifier.
Table 6. The Confusion matrix of the proportion of the communities being classified correctly into a particular community based on KDD2 classifier.
Table 7. The confusion matrix of the communities in Kilifi County classified based on KDSC classifier.
Table 8. The Confusion matrix of the proportion of the communities being classified correctly into a particular community based on KDSC classifier.
Table 9. The confusion matrix of the communities in Kilifi County classified based on QDA classifier.
Table 10. The confusion matrix of the proportion of the communities being classified correctly into a particular community based on QDA classifier.
Table 11. The confusion matrix of the communities in Kilifi County classified based on LDA classifier.
Table 12. The confusion matrix of the proportion of the communities being classified correctly into a particular community based on LDA classifier.
5. Conclusions and Recommendations
5.1. Conclusions
The main objective of this paper was to develop a nonparametric discriminant classifier and use it to find which neighboring local community in Kilifi County that the Pemba community which has lived for long in a stateless nature can be integrated into so that they can be recognized as Kenyans and hence live like any other Kenyan Citizen and enjoy.
From the results, the following observations and conclusions have been made:
1) Classification of stateless communities in Kenya can be done using the Kernel discrimination classification methods to find which local communities they can be integrated into.
2) The Nonparametric Kernel Discriminant Classifiers; KDD2 classifier apart from truly classifying Pemba community as Pemba also classified them into other tribes with 29 people being classified as Giriama, 87 as Pemba and 21 people as Rabai. The KDSC classifier classified 29 people as Giriama, 88 as Pemba and 20 as Rabai. The Parametric discriminant classifiers; QDA classifier classified majority of the Pemba people, 20 as Giriama, 79 as Pemba and 20 as Rabai while the LDA classifier classified the Pemba people with 22 being classified as Giriama, 76 as true Pemba and 19 as the Rabai.
3) Based on certain similarities in characteristics that exist in the communities that surround the Pemba Community, the Pemba community can be classified as Giriama in which they seem to have a strong link. The alternative local community that could have Pemba integrated is the Rabai Community.
5.2. Recommendations
The study recommends the use of Kernel discriminant technique to classify the stateless communities in Kenya e.g. Pemba. This approach can be extended to similar groups across the world. This will go a long way in achieving UNHCR recommendation of finding a solution on how to recognize the stateless communities and register them as citizens. In addition to this, the study also recommends more data on various dimensions to be collected on the stateless peoples which seem to have been excluded in the census of 2009 conducted by the Kenya so as to allow for more analyses and improve the efficiency of the results obtained. Lastly, the study recommends other classification techniques which can handle the high dimensional spaces such Neural Networks to be considered in the future studies so as to see if efficiency of classification can be improved.