Data Categorization and Noise Analysis in Mobile Communication Using Machine Learning Algorithms

Machine learning and pattern recognition contains well-defined algorithms with the help of complex data, provides the accuracy of the traffic levels, heavy traffic hours within a cluster. In this paper the base stations and also the noise levels in the busy hour can be predicted. J48 pruned tree contains 23 nodes with busy traffic hour provided in east Godavari. Signal to noise ratio has been predicted at 55, based on CART results. About 53% instances provided inside the cluster and 47% provided outside the cluster. DBScan clustering provided maximum noise from srikakulam. MOR (Number of originating calls successful) predicted as best associated attribute based on Apriori and Genetic search 12:1 ratio.


Introduction
The classification (or automated categorization) of texts into predefined categories has spectator with a booming interest in the last 10 years, due to the increased availability of information in digital form in communication technology and the ensuing need to organize them [1].Technological advances can produce a flood of large data sets that have led to massive data analytic problems and can easily lead to flawed inferences.Statisticians might benefit from learning more about wireless signal controls, and thinking up ways to use data on controls in their analyses [2][3][4].
Machine learning contains well-defined algorithms, data structures, and theories of learning by automated cauterization or classification of text in to predefined categories [1].Machine learning became a central research area since mid-1950, due to achieve recognition in artificial intelligence to understand the phenomenon of learning data sets [5].
Pattern recognition and data mining from past few years has fundamental operations in partitioning large set of objects in to homogeneous clusters [6,7].Scientific data provides a platform to learn the data in search of hidden patterns that exist in large data bases .datamining is the advancement of inductive learning technique to evaluate the usefulness of the cases retrieved from large data sets [8].
In this paper we describe an application of machine learning to an important communication problem: Detection of busy traffic hours in the base stations of an area.We cover the application of machine learning from the formulation of the problem to the delivery of a system for field testing which includes soft handoff traffic and busy traffic hour, soft handoff rate, number of calls, originating calls, paging response, call termination rates.The primary purpose of the paper is to present machine teaching research communities that have general importance in communication technology in machine learning applications.
algorithm.Classification, Regression Trees is a classifier method which in order to construct decision trees.

SimpleKMeans
In SimpleKMeans clustering, the similarity of two clusters is defined as the similarity of their centroids.The centroid of a cluster which is a point whose parameter values are the mean of the parameter values of all the points in the clusters.

DBScan
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a data clustering algorithm, as it finds a number of clusters starting from the estimated density distribution of corresponding nodes.It starts with an arbitrary starting point that has not been visited.This retrieves the neighboured clusters, and if it contains sufficiently many points, a cluster is started.Otherwise, the point is labelled as noise.

Apriori
Apriori is a classic algorithm for learning association rules, designed to operate on databases for finding patterns in data.

Genetic Search
It is a sub set of scoring algorithm, search for multiple solution simultaneously.These solutions blended with each other and are maintained in population based on fitness.

Results
The mobile communication depends on the transmitted signal and also the number of users in the cluster.Signal traffic of those mobile users is carried by the base stations.In this paper, the analysis related to the clustering and associated study has been constricted through survey in various base stations, in a clustered area.J48 pruned tree provided the result with 13 leaves (4 for east Godavari, 2 for Vizag, 4 for Vijayanagaram and 3 for srikakulam).Busy traffic hour has been provided in a leave for east Godavari.The size of the tree contains 23 nodes (Figure 1).
CART Decision Tree provides 3 leaf nodes with 5 branches.Signal to noise ratio has been predicted at 55 (Figure 2).
SimpleKMeans provided the centroid data for the clustered dataset.About 53% instances provided inside the cluster and 47% provided outside the cluster (Figure 3).
DBScan clustering provided maximum noise from srikakulam.Five clusters have been predicted based on the clustering results (Figure 4).The best associated attribute predicted based on Apriori and Genetic search is predicted as MOR (Number of originating calls successful) with 12:1.

Discussion
Mobile communication traffic data analysis has been often used as a background application to motivate many data mining problems [9].The data mining tool tracks for a minimal difference set between things because they believe a list of essential differences is easier to read and understand than detailed descriptions.Summarizing the large data sets to find the data that really matters detailed summaries and generating extensive and lengthy descriptions [10].
A new data mining algorithm which involves incremental mining for user moving patterns in a mobile computing environment and exploit the mining results to develop data allocation schemes so as to improve the overall performance of a mobile system [11].Data collected from mobile phones have the potential to provide insight into the relational dynamics of individuals.Dis-tinctive temporal and spatial patterns in their physical proximity and calling patterns allow the prediction of individual-level outcomes such as job satisfaction [12].
Group pattern is used to locate different groups of mobile users associated by means of physical distance and amount of time spent together.Performance of the method indicates a suitable segment size and alpha value needs to be selected to get the best result [13].Mining frequent sub trees from databases of labelled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc.The application needs more expressive power of labelled trees to capture the complex relations among data entities [14].Mobile traffic caused by the mobile users in a base station data mining is about finding useful knowledge from the raw data produced by them.Performance evaluation shows that as the number of characteristics increases, the number of rules will increase dramatically and therefore, a careful choosing of only the relevant characteristics to ensure acceptable amount of rules [15].

Conclusion
Group pattern of mobile user's results to develop data allocation schemes so as to improve the overall performance of a mobile system without interruption, as the traffic rate is dramatically increasing.Signal to noise ratio has been predicted at 55, based on CART results.The development of intelligent data analysis in mobile communication from the machine learning perspective is necessary in future.