DDoS Attack Detection Using Heuristics Clustering Algorithm and Naïve Bayes Classification

In recent times among the multitude of attacks present in network system, DDoS attacks have emerged to be the attacks with the most devastating effects. The main objective of this paper is to propose a system that effectively detects DDoS attacks appearing in any networked system using the clustering technique of data mining followed by classification. This method uses a Heuristics Clustering Algorithm (HCA) to cluster the available data and Naïve Bayes (NB) classification to classify the data and detect the attacks created in the system based on some network attributes of the data packet. The clustering algorithm is based in unsupervised learning technique and is sometimes unable to detect some of the attack instances and few normal instances, therefore classification techniques are also used along with clustering to overcome this classification problem and to enhance the accuracy. Naïve Bayes classifiers are based on very strong independence assumptions with fairly simple construction to derive the conditional probability for each relationship. A series of experiment is performed using “The CAIDA UCSD DDoS Attack 2007 Dataset” and “DARPA 2000 Dataset” and the efficiency of the proposed system has been tested based on the following performance parameters: Accuracy, Detection Rate and False Positive Rate and the result obtained from the proposed system has been found that it has enhanced accuracy and detection rate with low false positive rate.


Introduction
In today's world of high speed internet and network system, security of system How to cite this paper: Bista, S. and Chitrakar, R. (2018) DDoS Attack Detection Using Heuristics Clustering Algorithm and Journal of Information Security from various threats has been a major concern world widely.Among various possible network threats and attacks, Distributed Denial of Service attack is the attack with most devastating effects.A Denial of Service attack is the type of attack that typically uses a single computer and one internet connection to flood a targeted system or resources [1] so as to prevent the legitimate users from accessing the system or the resources.A distributed denial of service attack is one in which a multitude of compromised systems attack a single target, thereby causing denial of service for users of the targeted system.Intrusion detection is "the process of monitoring the events occurring in a computer system or network and analyzing them for signs of intrusions, defined as attempts to compromise the confidentiality, integrity, availability, or to bypass the security mechanisms of a computer or network" [2].There are generally two types of Intrusion detection system: Misuse detection and Anomaly detection.In misuse detection, each instance in a data set is labeled as "normal" or "intrusion" and a learning algorithm is trained over the labeled data.Whereas an anomaly detection technique builds models of normal behavior, and automatically detects any deviation from it, flagging the latter as suspect [2].
For developing an effective intrusion detection system, data mining techniques have been very helpful and a lot of research is ongoing these days because data mining approach is useful for extracting a wide range of features from network flow which can be helpful for distinguishing the attack packet from normal packet.In this proposed system, clustering followed with classification technique of data mining has been used.Clustering is the unsupervised technique that is used to group together the similar items to extract new knowledge from a largely data set.While classification is a data mining technique that assigns categories to collection of data in order to aide in more accurate predictions and analysis.
Clustering technique means separating dissimilar items, according to some defined dissimilarity measure among data items themselves [3].The most widely used clustering technique for DDoS detection is K-means Clustering algorithm that separates the anomaly packet from normal packet.A variation of K-means algorithm called as K-Medoids has also been used.K-Means algorithm takes the mean of data point as the cluster center therefore is influenced by the extreme values and outliers.It is simple, has low time complexity but is sensitive to initial centers since we need to assume the number of cluster at the beginning of the clustering and the initial centers are chosen at the random.The other major shortcomings of K-Means are: 1) degeneracy and 2) incapability to process the character attributes of network packet.K-Medoids algorithm however solves the degeneracy problem of K-Means algorithm since in K-Medoids we choose the actual data objects present in the data set as the center of the cluster instead of taking the mean value of the data sets.It is more robust to noises and outliers.Therefore, we can say that the existing works that has been based on K-Means and K-Medoids has three shortcomings namely degeneracy, cluster dependency and lacking of the ability of dealing with character attributes in the network transactions.
Classification technique categories the available set of data for accurate analysis.The category can be termed as class label.In case of anomaly detection, it will classify the data generally into two categories namely normal or abnormal [4].A Naïve Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naïve) independence assumptions.A Naïve Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature.Depending on the precise nature of the probability model, Naïve Bayes classifiers can be trained very efficiently in a supervised learning setting.
Bayes Theorem can be expressed as: Let X be the data record, H be some hypothesis representing data record X, which belongs to a specific class C. For classification, we would like to determine P(H|X), which is the probability that the hypothesis H holds, given an observed data record X. P(H|X) is the posterior probability of H conditioned on X.In contrast, P(H) is the prior probability.The posterior Probability P(H|X), is based on more information such as background knowledge than the prior probability P(H), which is independent of X.Similarly, P(X|H) is posterior probability of X conditioned on H. Bayes theorem is useful because it provides ways to calculate the posterior probability P(H|X) from P(H), P(X), and P(X|H) [5].
Therefore, the use of Heuristic Clustering Algorithm followed by Naïve Bayes Classification in this paper has contributed to overcome the problem of degeneracy, has developed as DDoS attack detection system that takes into account of both the character and numerical attributes of the network data packet.The proposed hybrid learning approach has lead into better performances in terms of Accuracy, Detection Rate and False Positive Rate and has proved that hybrid learning approach is better than Clustering and Classification technique alone.The proposed method uses Heuristic Clustering Algorithm for clustering of data which is then followed by Naïve Bayes Classification for classifying the clusters into either Normal or Attack instances.For comparison of the results obtained from the proposed method with the result from existing system of reference paper, labelling scheme defined in the paper is also performed after clustering.Finally, the result obtained is compared using the performance parameters namely Accuracy, Detection Rate and False Positive Rate.The algorithm used is discussed below., where H N is the subset of numerical attribute and H S is the subset of character attribute.

Heuristic Clustering Algorithm
, , ,  , e i is a record, the m is number of attribute values and h ij is the value of H m .Notation 4:  , E is the set of records; n is the number of packets [2].

The Center of Cluster
A cluster is represented by its cluster center.In the HCA algorithm, we use the algorithm Count ( ) to compute the cluster center.The center of a cluster is composed of the center of numerical attributes and character attribute.Let P = (P N + P S ), and P = (P 1 , P 2 , •••, P m ) where P N is the center of numerical attribute, the P S is the center of character attribute, ( )

P hji i p p m n
The hji is the numerical attribute and P S is the frequent character attribute set which consists of q most frequent character attribute [2].

2) The Initial Center of Cluster
In the beginning of clustering, we should confirm two initial center of clustering by the algorithm Search ( ).
Input: E = data set l = number of sampling Output: Initial center m 1 , m 2 . Pseudocodes:

3) Computing Similarity
The dataset consists of numerical attribute and character attribute.The similarity of character attributes is calculated through attribute matching.
Let e i and e j be two records in the E. all containing m attributes (including P character attributes), the nhik and nhjk is the number of hik and hjk respective- The similarity of two records (including similarity of numerical attribute and similarity of character attribute) is calculated as:
Step 2. Import a new record.
Step 3. Compute the similarity between the new record and the centers of clusters by algorithm Similar ().
Step 4. Compute the similarity between the centers of clusters.
Step 5.If the minimum similarity between the record and centers of clusters is greater than the minimum similarity between the centers of clusters, create a new cluster with the record as the new center until no change [2].

5) Labelling
In the labeling method, we assume that center of a normal cluster is highly close to the initial cluster center vh which are created from the clustering.In other words, if a cluster is normal, the distance between the center of the cluster and vh will be small, otherwise it will be large.Thus, we first, for each cluster center Cj, calculate the maximum distance to vh.We then calculate the average distance of the maximum distances.If the maximum distance from a cluster to vh is less than the maximum average distance, we label the cluster as normal.
Otherwise, label as attack.Here the similarity measure is used as the distance measure i.e.Attribute Matching for character attributes and Euclidean distance measure for numerical attributes [2].

Experiments and Results
Two sets of experiments are performed as: c) Data Pre-processing Data pre-processing is done to eliminate all those data packets that would ultimately lead to wrong results using data analysis tools: Wireshark Tool.
d) The Experimental Procedure Using the selected sets of data samples, both the programs are executed simultaneously and the number of true positive, true negative, false positive and false negative values of both the programs are recorded and used in the performance evaluation of both the programs.e) Performance Parameters The performance of the proposed algorithm is evaluated using the Performance parameters namely Accuracy (A), Detection Rate (DR) and False Positive Rate (FPR) using following equations:

Performance Analysis
From the above experiments and results, it is seen that the Accuracy and Detection Rate has been improved with corresponding reduction in False Positive Rate.Therefore, the proposed algorithm has justified it's intend of improving the results in terms of performance parameter of Heuristics algorithm alone.
Figure 2 shows the improvement in Accuracy with HCA clustering followed by NB classification where we can see that the highest improvement is 25.82% and lowest improvement is 3.74% for CAIDA dataset.Whereas, the highest improvement is 29.67% and the lowest improvement is 0.80% for DARPA dataset.
Figure 3 shows the improvement in Detection Rate with HCA clustering followed by NB classification where we can see that the highest improvement is 66.35% and lowest improvement is 4.99% for CAIDA dataset.Whereas, the highest improvement is 80.88% and the lowest improvement is 0.10% for DARPA dataset.Journal of Information Security Figure 4 shows the improvement in False Positive Rate with HCA clustering followed by NB classification where we can see that the highest improvement is 2.60% and lowest improvement is 0.04% for CAIDA dataset.Whereas, the highest improvement is 27.03% and the lowest improvement is 0.80% for DARPA dataset.

M
. Jianliang, et al. has introduced the application on intrusion detection based on K-means clustering algorithm.K-means is used for intrusion detection to detect unknown attack and partition large data space effectively but it has many disadvantages like degeneracy and cluster dependence.Yu Guan, et al. has introduced Y-means algorithm which is a clustering method of intrusion detection.This algorithm is based on K-means algorithm and other related clustering algorithm.It overcomes two short comings of K-means i.e. no of cluster dependency and degeneracy.Zhou mingqiang, et al. has introduced a new concept of a graph based clustering algorithm for anomaly based clustering algorithm for anomaly intrusion detection.They used outlier detection method which is based on local deviation coefficient (LDCGB).Compared to other intrusion detection algorithm of clustering this algorithm is unnecessary to initial cluster number.T. Velmurugan and T. Santhanam have analyzed the efficiency of k-Means and k-Medoids clustering algorithms by using large datasets in the cases of normal and uniform distribution; and found that the average time taken by k-Means algorithm is greater than that of k-Medoids algorithms for both the cases [2].M. Jianliangetall has implemented K-means algorithm to cluster and analyze the data of KDD-99 dataset.This algorithm can detect unknown intrusions in the real network connections.The simulations results that run on KDD-99 data set showed that the K-means method is an effective algorithm for partitioning large data set.Jose F. Nieves presented a comparative study with more emphasis on the unsupervised learning methods for anomaly detection.K-means algorithm with KDD Cup 1999 network data set is used to evaluate the performance of an unsupervised learning method for anomaly detection.High detection rate can be achieved while maintaining a low false alarm rate is the results of this work evaluation [6].K. Sarmila, G. Kavin has introduced the Heuristic clustering algorithm to cluster the data and detect DDoS attacks i; n DARPA 2000 datasets and has obtained better results in terms of detection rate and false positive rate in comparison to K-Means and K-Medoids algorithm.Chitrakar R and Huang chuanhe has proposed a hybrid learning approach of combining k-medoids clustering and naive bayes classification that has grouped the whole data into clusters more accurately than K-means such that it results in better classification.The hybrid approach was tested in Kyoto 2006+ datasets.

Figure 1
Figure 1 here shows the system block diagram of the proposed algorithm .Here, the workflow starts with the extraction of nine network attributes from the datasets followed by the preprocessing of data to eliminate those data values that would ultimately result in wrong output.Once, the dataset is prepared after preprocessing, those datasets are fed into Heuristics Clustering Algorithm that results = hjk) then A = 0 else A = 1.The similarity of numerical attribute (to the numerical attribute, still use the classical Euclidean distance to computer similarity.

1 )
Heuristics Clustering Algorithm with Labelling 2) Heuristics Clustering Algorithm with Naïve Bayes Classification a) Selection of Experimental Data To perform the series of experiments 12 samples of two different datasets namely "CAIDA UCSD DDoS Attack 2007 Dataset" and DARPA 2000 Dataset" with each sample consisting of 10,000 datasets are selected.b) Extraction of Network Attributes The set of 9 data packet attributes are extracted from the dataset.The attributes are Source IP Address, Destination IP Address, Protocol, Source Port, Destination Port, Sequence number, Acknowledgment number, length, and Window size.

Figure 2 .
Figure 2. Improvement in Accuracy with HCA Followed by NB Classification.

Figure 3 .
Figure 3. Improvement in Detection Rate with HCA Clustering followed by NB Classification.

Figure 4 .
Figure 4. Improvement in false positive rate with HCA clustering followed by NB classification.
Data set having n data objects : The predicted class CNB where X should be classified into.
C: Set of classes e.g.{Normal; Attack} X: Data record to be classified H: Hypothesis (that X is classified into C) Output

Table 1 .
Attacks that are correctly detected as attack True Negative (TN) = Normal data that are correctly detected as normal False Positive (FP) = Normal data that are incorrectly detected as attack False Negative (FN) = Attack that are incorrectly detected as normal The below shown tables illustrates the improvement of accuracy, detection rate and false positive rate of the proposed algorithm i.e.Heuristics Clustering Algorithm with Naïve Bayes Classification over Heuristics Clustering algorithm with Labelling.Table 1 here shows the improvement in Accuracy with HCA Clustering with NB Classification in UCSD DDoS attack 2007 dataset where we can see the average improvement of 8.16% with highest improvement of 25.82% and lowest as 2.11%.Table 2 here shows the improvement in Accuracy with HCA Clustering with NB Classification in DARPA 2000 dataset where we can see the average improvement of 14.31% with highest improvement of 29.67% and lowest as 0.8%.Table 3 here shows the improvement in Detection Rate with HCA Clustering with NB Classification in UCSD DDoS attack 2007 dataset where we can see the average improvement of 32.21% with highest improvement of 66.35% and lowest as 1.71%.Table 4 here shows the improvement in Detection Rate with HCA Clustering with NB Classification in DARPA 2000 dataset where we can see the average improvement of 42.49% with highest improvement of 90% and lowest as 0.1%.Table 5 here shows the reduction in False Positive Rate with HCA Clustering with NB Classification in UCSD DDoS attack 2007 dataset where we can see the average reduction of 1.22% with highest reduction of 2.6% and lowest as 0.04%.Table 6 here shows the reduction in False Positive Rate with HCA Clustering with NB Classification in DARPA 2000 dataset where we can see the average reduction of 11.84% with highest reduction of 27.03% and lowest as 0.8%.Comparison of accuracy in CAIDA UCSD DDoS attack 2007 dataset.

Table 2 .
Comparison of accuracy in DARPA 2000 dataset.

Table 3 .
Comparison of Detection Rate in CAIDA UCSD DDoS Attack 2007 Dataset.

Table 4 .
Comparison of detection rate in DARPA 2000 dataset.

Table 5 .
Comparison of false positive rate in CAIDA UCSD DDoS attack 2007.

Table 6 .
Comparison of false positive rate in DARPA 2000 dataset.