Optimal Classifier for Fraud Detection in Telecommunication Industry

Fraud is a major challenge facing telecommunication industry. A huge amount of revenues are lost to these fraudsters who have developed different techniques and strategies to defraud the service providers. For any service provider to remain in the industry, the expected loss from the activities of these fraudsters should be highly minimized if not eliminated completely. But due to the nature of huge data and millions of subscribers involved, it becomes very difficult to detect this group of people. For this purpose, there is a need for optimal classifier and predictive probability model that can capture both the present and past history of the subscribers and classify them accordingly. In this paper, we have developed some predictive models and an optimal classifier. We simulated a sample of eighty (80) subscribers: their number of calls and the duration of the calls and categorized it into four sub-samples with sample size of twenty (20) each. We obtained the prior and posterior probabilities of the groups. We group these posterior probability distributions into two sample multivariate data with two variates each. We develop linear classifier that discriminates between the genuine subscribers and fraudulent subscribers. The optimal classifier ( A B β + ) has a posterior probability of 0.7368, and we classify the subscribers based on this optimal point. This paper focused on domestic subscribers and the parameters of interest were the number of calls per hour and the duration of the calls.


Introduction
Communication industry has made the world a global village, and among all Open Journal of Optimization components of the industry, telecommunication is the most popular and most widely used [1].It has created employment opportunities and empowered people economically and has removed distance, thereby saving lives and cost.Telecommunication has created opportunities for both the service providers and subscribers to do their separate but related businesses and earn their livings.But all these blessings do not come without some serious consequences of fraud in the business.Our interest in this paper is to detect fraud in the industry using the frequency and the duration of their calls.Fraud detection in telecommunication industry is vital to the survival of the industry.It is a common knowledge that fraudsters have flooded the telecommunication industries in various ways ranging from illegal access to bandwidth, attack on cyber securities, access to pocket of data, and illegal calls.All these constitute a huge loss to the telecommunication industries.These illegalities may force some of the service providers out of the industry if not properly checked.The multiplier effects of these fraudulent activities are massive loss of jobs, decline in the standard of living and its attendant consequences on those directly involved and others not directly involved.The most difficult aspect of these fraudsters is that they are smart and can hack into the data base of these service providers who should not sit back and watch them destroy their businesses.Since fraud is not localized, and does not have a permanent "office", it can be committed at anywhere and at any time.Telecommunication operators store large amounts of data related to the activities of their subscribers.In these records, there exist both normal and fraudulent records.It is expected for the fraudulent activity records to be substantially smaller than the normal activity.If it were the other way around this type of business would be impractical due to the amount of revenue lost [1].
This sector broadly has two types of users-domestic and commercial.There are cases where the connections are bought under domestic categories but the use is on a commercial scale.This causes substantial loss to the sector [2].There is a need to adopt a data mining technique that will filter these fraudsters.Data volume has been growing at a tremendous pace due to advancements in information technology.At the same time there has been enormous development in data mining.Data mining can be defined as the process of extracting valuable information from data [3].The telecommunication sector acquires huge amount of data due to rapidly renewable technologies, the increase in the number of subscribers and with value added services.Uncontrolled and very fast expansion of this field cause increasing losses depending on fraud and technical difficulties [4].
Today, telecommunication market all over the world is facing a severe loss of revenue due to fraudsters [5].To overcome such business hazards and to retain the market, operators are forced to look for alternative ways of using data mining techniques and statistical tools to identify the cause in advance and to take immediate actions in response.This can be possible if the past history of the subscribers were analyzed systematically.Fortunately, telecom industries generate and maintain a large volume of data such as Call detail data and Network Open Journal of Optimization data [6].One reason for the non-utilization of this potential is the insufficient knowledge of the algorithms to be used on such data.Data mining tools and algorithms can be used to exploit the potential in the data when the data is synthesized efficiently.The advent of data mining algorithms and the development of software and hardware have led to an ease in analyzing huge and complex data [7].Globally, the development of telecommunications industry is rapidly increasing with one innovation replacing another in a matter of years, months, and even weeks.Without doubt telecommunication is a key driver of any nation's economy.Telecommunication is the communication of information by electronic means usually over some distance.It involves the transmission and receipt of information, messages, graphics, images, voice, video and data between or among telephones, internet, satellites and radio [8].
In this area, some researchers have used different methods to determine both customer churn and fraud detection.Fraud detection and subscribers churn are related in the sense that both are concerned with subscriber's behavior.Among the models used for data mining for both churn and fraud detection are naïve Bayes model; Gaussian probability distribution; Decision Tree algorithm; logistic regression and artificial neural network (ANN).Data mining is the extraction of vital information from the bulk of data available to the telecommunication industry and using an appropriate predictive model to classify and determine the behavior of subscribers.By refining the data and building an appropriate statistical model, so much hidden information about the subscribers and service providers will be unveiled, see [9] [10] [11] [12].This information is very vital to the survival of any service provider such as MTN, GLO, ETISALAT, MTEL, etc., in the business of telecommunication, especially in Nigeria.We shall use the subscribers' frequency of calls and the duration of such calls as parameters of interest in this paper.Then, we shall determine the prior and posterior probabilities of the subscribers and their number of calls at a given time.We shall develop a linear discriminant function which will be used to classify the posterior probability distribution into fraud and genuine subscribers.In this paper, we are concerned with statistical modeling and not machine learning or artificial intelligence method of classification.
Because of the privacy agreement between the service providers and subscribers on one hand and to protect the service providers' respective businesses on the other hand, the service providers hardly disclose their data.But nevertheless, simulation offers a close substitute for real life data.Hence, in this paper, we simulate data that depict the real life scenario and use it for the study.We simulate data on number of calls per unit time, and the call duration and our interest is on the domestic subscribers only.Eighty (80) sample data points were simulated for the study.The samples were categorized into four (4) with each having twenty (20) observations representing subscribers.The number of calls per subscriber over a period of time was also simulated and these represent real life data and are used for this study.The sample data generated from such process look like real life data drawn from a real system.We employed MINITAB 16.0 for the simulation of the data in this work.A sample of 20 observations each on the average number of calls and rate given as follows; 8 (t = 3), 5 (t = 4), 9 (t = 12), 6 (t = 7), were simulated for the study.The values such as 8, 5,•••, 7 outside the bracket represent the average number of calls per hour, and the values in bracket represent the average duration of the entire calls in minutes.Our interest is to develop a predictive data mining model for fraud detection in telecommunication industry.The simulated data were categorized into two sample multivariate data groups A and B. Most importantly, service providers determine their customers' behaviour from the nature of their current calls and their past behaviour.

Methodology
We need to know the history of these subscribers based on the information available to the network providers (service providers).This information is basically obtained from their call history.For this reason, the appropriate probability model that has a memory and can capture such a past history and relate it to the current history of subscribers' is the Bayesian statistic model.However Bayesian statistics requires a prior probability.Some researchers make mistake of estimating the prior probability in this type of study using a continuous distribution as though the number of calls belong to a continuous random variable.Actually, the number of calls is a Poisson problem and therefore belongs to a discrete probability distribution.The value of Poisson random variables are the non-negative integers, and any random phenomenon for which a count is of interest can be modeled by assuming a Poisson distribution, provided that the random variables satisfies certain assumptions regarding the distribution [13].Example of such a count includes the number of telephone calls per unit time coming into the switch board of a large business.Hence, we shall estimate the prior probabilities using Poisson distribution.Since each subscriber's number of calls and time involved have non-stationery increment, we assume a non-homogenous Poisson process (NHPP) with parameter ( ) t ω , where, ω is the call rate and t is the time duration for the calls.This has been tested and the shape parameter b was found to be greater than zero.The intensity function of power law process model ( ( ) ) can be used to describe the intensity of a NHPP.The power law process model has the mean and intensity function as The parameters of the model are obtained by log linear transformation of the mean value function.
( ) and a plot of ln ( ) against ln(t) will yield the value of ln(a) as the intercept and b as the slope of the linear graph.If the shape parameter b = 1, there is a stationary increment and we have HPP( ω ) but for b > 1, we have NHPP( ω t) [14].
Hence, the predictive probability model for the priors is: [15], where P n (t) = the probability of n number of calls at a given time (t) and the other notations retain their usual meaning as defined before.
The following assumptions must be satisfied by the random variables before we can use Equation (3) above: Model Assumption: The stochastic process Process with rate function ( ) (The number of events at time zero is equal to zero). 2) has independent increment: (The number of events in non-overlapping time interval are independent). 3) -some function of smaller order than h which satisfy the limit). 4) The probability that exactly one event will occur in a small interval of length t + h approximately equal to ( ) ( ) The probability that more than one event will occur in a small interval of length t + h).
7) The events must occur at random [16].Bayesian statistics model is adapted for the posterior distribution since it has the attribute of capturing the prior behaviour of these subscribers to determine their current behaviour.Hence, the predictive statistical model for this study is where ( ) = the conditional probability that the random variable ψ assumes a specific value ζ given that its prior probability was ω .Note that ω is now a random variable.
Our interest is to classify the subscribers as either genuine or fraudulent.
Hence, this is a classification problem and linear discriminant analysis will be employed to classify the subscribers where they belong.This classification will enable service providers to determine the measures to take against these fraudsters.The discriminant analysis will discriminate between the legitimate subscribers and fraudulent ones within the network.The idea of discriminant analysis is a search for the differences in two or more groups that consist of multivariate measurements.One (or more) linear function(s) which maximally differentiate(s) between these groups are constructed.These functions are then used to classify new member of similar group into the appropriate group they belong and differentiate them from the group they do not belong to [18].The linear discriminant function employ is given in Equation ( 5).
( ) ( ) ( ) where 1 S − is the inverse of the dispersion (va- riance-covariance) matrix and ( ) ( ) is the difference in the mean vectors between the two multivariate samples and β is the linear discriminant func- tion.We established the optimal classifier of the discriminant function and finally classify the sample data accordingly based on their posterior probability distributions.Two multivariate sample data with two variates will be derived from Equation (4).The two sample multivariate data with two variates each are the posterior probabilities of each group.Then, we shall classify the samples as belonging to either genuine or fraudulent subscribers based on the optimal classifier ( A B β + ).Our classification rule will be: classify the subscribers in group A into "A 1 ; A 2 ", where A 1 is the fraudulent subscribers and A 2 is the genuine subscribers.Similarly, we do the same for group B designated by "B 1 ; B 2 ".Fraud subscribers tend to make use of the services much more than the genuine subscribers and should therefore have higher probability distributions.

Analysis
The average number of calls and time spent in each call are presented in Table 1.
A plot of ln(t) against ln( t ω ) is presented in Figure 1.
From Figure 1 Equation ( 6) is presented in  Table 2 presents the number of calls per hour, the probability distribution, the joint probability distribution, the prior and posterior probability distributions.
The average number of calls and time spent in each call are presented in Table 3.
A plot of ln(t) against ln( t ω ) is presented in Figure 2.
From The prior probability distribution in Equation (3) becomes Equation ( 7) is presented in Table 4 by the column labeled "Prob" and Equation ( 4) is presented in Table 4 by the column labeled "Poste.Pr".
Table 4 presents the number of calls per hour, the probability distribution, the joint probability distribution, the prior and posterior probability distributions.
The average number of calls and time spent in each call are presented in Table 5.
A plot of ln(t) against ln( t ω ) is presented in Figure 3.
From Figure 3, we determine the slope, b = 1.From the relationship in Equation (2) and Figure 3, we have that ln(a) = 2.1.Hence, ( ) The prior probability distribution in Equation ( 3) becomes Equation ( 8) is presented in Table 6 by the column labeled "Prob" and equation (4) is presented in Table 6 by the column labeled "Poste.Pr".
Table 6 presents the number of calls per hour, the probability distribution, the joint probability distribution, the prior and posterior probability distributions.Open Journal of Optimization  The average number of calls and time spent in each call are presented in Table 7.
A plot of ln(t) against ln( t ω ) is presented in Figure 4. From Equation ( 9) is presented in Table 8 by the column labeled "Prob" and Equation ( 4) is presented in Table 8 by the column labeled "Poste.Pr".
Table 8 presents the number of calls per hour, the probability distribution, the joint probability distribution, the prior and posterior probability distributions.
Table 9 presents the posterior probability distributions for the two multivariate groups A and B.
= the number of calls per subscriber per hour.t (min) = the time spent on the calls.Prop.n (n/N) = fraction of the number of calls in relation to the total number of calls.Pr.of Prio = the probability of priors.joint prb = the joint probabilities.Posterior = the posterior probabilities.Churn = the defection of subscribers from one network to another.
, we found that the slope, b = 1.And from the relationship in Equation (2) and Figure 1, we have that ln(a) = 2.2.Hence, The implication of the shape parameter being 1 indicate that the intensity function has stationery increment, through the PLP transformation; hence, this distribution follows HPP(ω) and the prior probability distribution of Equation (

Figure 4 ,
we determine the slope, b = 1.From the relationship in Equation (2) and Figure 4, we have that Ln(a) = 1.58.Hence, The prior probability distribution in Equation (3) becomes

Table 2
by the column labeled "Prob" and Equation (4) is presented in Table2by the column labeled "Poste.Pr".

Table 1 .
Average number of calls ( ω ) and average time spent (t) (

Table 9 .
Multivariate sample data.(A) Posterior prob.from group a; (B) Posterior prob.from group B.