Anomalous Network Packet Detection Using Data Stream Mining

In recent years, significant research has been devoted to the development of Intrusion Detection Systems (IDS) able to detect anomalous computer network traffic indicative of malicious activity. While signature-based IDS have proven effective in discovering known attacks, anomaly-based IDS hold the even greater promise of being able to automatically detect previously undocumented threats. Traditional IDS are generally trained in batch mode, and therefore cannot adapt to evolving network data streams in real time. To resolve this limitation, data stream mining techniques can be utilized to create a new type of IDS able to dynamically model a stream of network traffic. In this paper, we present two methods for anomalous network packet detection based on the data stream mining paradigm. The first of these is an adapted version of the DenStream algorithm for stream clustering specifically tailored to evaluate network traffic. In this algorithm, individual packets are treated as points and are flagged as normal or abnormal based on their belonging to either normal or outlier clusters. The second algorithm utilizes a histogram to create a model of the evolving network traffic to which incoming traffic can be compared using Pearson correlation. Both of these algorithms were tested using the first week of data from the DARPA’99 dataset with Generic HTTP, Shell-code and Polymorphic attacks inserted. We were able to achieve reasonably high detection rates with moderately low false positive percentages for different types of attacks, though detection rates varied between the two algorithms. Overall, the histogram-based detection algorithm achieved slightly superior results, but required more parameters than the clustering-based algorithm. As a result of its fewer parameter requirements, the clustering approach can be more easily generalized to different types of network traffic streams.


Introduction
Since the 1990's, internet usage has become an integral part of our daily lives.As a result, computer networks have experienced an increased number of sophisticated malware attacks.Whereas attackers previously attempted to gain access to restricted resources to demonstrate their skill, a new wave of internet-based attacks has shifted the focus primarily towards criminal motives.Due to the availability of software tools designed to exploit vulnerabilities, attackers can create viruses with greater structural complexity and damaging capability using less sophisticated skills.The security challenges resulting from an increasing number of devices connected to the internet has prompted a significant amount of research de-voted to network security.

Intrusion Detection Systems
One notable topic of network security research is the development of Intrusion Detection Systems (IDS), which attempt to detect threats to a network or host through signature-based or anomaly-based methods.To detect intrusions, signature-based IDS generate "signatures" based on characteristics of previous known attacks.This allows the systems to focus on detecting attacks regardless of ordinary network traffic.Signature-based detection is the most common form of intrusion detection because it is simple to implement once a set of signatures has been created.Although this approach is effective in finding known threats to a network, it is unable to identify new threats until a new signature is made.To generate an accurate signature, a human expert is generally needed because this cannot easily be done automatically.Since the detection of new threats in a signature-based system is impossible without the aid of a new signature, an alternative method has been proposed.
In contrast to the signature-based approach, anomalybased IDS adaptively detect new attacks by first generating a "normal" pattern of network traffic.These systems then find anomalies by comparing incoming packets with the "normal" model.Anything that is considered statistically deviant is classified as anomalous.This allows for the systems to automatically detect new attacks though risking possible misclassification of normal behavior(false positive).In addition to the potential for false positives, anomaly-based systems also fall prey to "mimicry attacks", which attempt to evade the IDS by imitating normal network traffic.One such attack is known as a Polymorphic Blending Attack (PBA), in which the attacker uses byte padding and substitution to avoid detection [1].Recent research has focused on increasing the efficiency, robustness, and detection rates of these systems while lowering their often high false-positive rates.
One of the first well-developed anomaly-based systems is NIDES [2], which builds a model of normal behavior by monitoring the four-tuple header of packets.The four-tuple contains the source and destination IP addresses and port numbers of packet headers [3].Another system proposed by Mahoney et al. [4] was comprised of two different programs, PHAD and ALAD.Whereas PHAD monitors the data contained in the header fields of individual packets, ALAD looks at distinct TCP connections consisting of multiple packets [3,4].To detect anomalies, PHAD and ALAD use port numbers, TCP flags, and keywords found in the payload.Yet another approach, known as NETAD [5], monitors the first 48 bytes of each IP packet header and creates different models based on each individual network protocol.Then, using the information recovered from the packet's header, NETAD creates different models each corresponding to a particular network protocol [6].
Two recently developed network anomaly-based intrusion detection systems are PAYL and McPAD [6,7].Both used n-grams, sequences of n consecutive bytes in a packet's payload, as features to represent packets To perform anomaly detection, PAYL utilizes 1-grams.This system first generates a histogram for normal traffic, and then a new histogram for each packet's payload.The two histograms are compared using the simplified Mahalanobis distance.If the distance is above a certain threshold, the new packet is flagged as anomalous [7].Despite this approach's effectiveness, it suffers from a high false positive rate.To combat this, an extension of PAYL was proposed to use n-grams, creating a more precise detection model [8].
McPAD further develops the effectiveness of the ngram version of PAYL by using 2-nu-grams, sequences of two bytes separated by a gap of size nu.The 2-gram contains the correlation between two bytes, a feature that 1-gram lacks.By combining the 2-gram with nu, McPAD is able to analyze structural information from higher ngrams while keeping the number of features the same as a 2-gram.By varying the value of nu, McPAD builds multiple one-class support vector machine (SVM) classifiers to detect anomalies as an ensemble.These classifiers are first trained on anomaly free data then tested with mixed normal and abnormal packets [6].Using this approach, McPAD has successfully detected multiple virus types while maintaining a low false positive rate.

Stream Data Mining
Although PAYL and McPAD have been able to achieve desirable results, they are not designed to deal with the gradual or abrupt change in the data flows.Also because of the Internet's high speed, systems such as PAYL and McPAD can not efficiently store and evaluate the large amount of traffic generated in real-time.To counter these issues, we propose the application of data stream mining techniques to anomaly detection.
Stream mining differs from the traditional batch setting in a number of ways.First and foremost, because data streams are of extremely large or even infinite size, individual objects within the stream may only beanalyzed once-not repeatedly as is possible in batch mode [9].The continuous nature of data streams also places significant time constraints on stream mining solutions.For a stream mining approach to be practical and effective, it must be able to process incoming information as quickly as it arrives [10].
One of the salient features of any data stream mining algorithm is the ability to detect fluctuations within a continuous stream of data over an unknown length of time.This dynamic tendency of streaming data is called "concept drift" when a change occurs gradually, or "concept shift" when it occurs more quickly [10].To deal with this characteristic of streaming data, many stream mining algorithms employ a window of time intervals to temporarily hold the most recent data points in a stream [10,11].The three types of windows typically implemented are landmark window, sliding window and damped window [11].

Data
Two publicly available datasets were used to evaluate the anomaly detection algorithms proposed in this study.These were the DARPA'99 intrusion detection evaluation dataset (http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/1999data.html), and the attack dataset provided by [6].
The DARPA'99 dataset was used to provide a sampling of normal network traffic.This dataset simulates network communication from a fictitious United States Air Force base [12], and provides both attack-free and attack-containing network traces.Data samples for this study were obtained from HTTP requests found in outside tcp dump data for each day from week one of the DARPA'99 dataset.This first week of data is provided for training purposes and contains no anomalous network traffic.Using Jpcap, a free Java-based library for network packet manipulation, (http://netresearch.ics.uci.edu/kfujii/Jpcap/doc/), the numeric character values for all HTTP packet payloads with lengths at least 1400 characters were extracted for each day.The resulting dataset provided a total of 5594 packets representing normal network traffic divided by days as is shown in Table 1.
The anomalous data used in this study were compiled by [6], and are freely available online (http://roberto.perdisci.com/projects/mcpad).We chose to analyze the algorithms' performance in the detection of three out of the four attack types provided: Generic HTTP Attacks, Shell-code Attacks, and CLET Shell-code attacks.[6] obtained 63 of the attacks included in their Generic HTTP dataset from [13].These attacks include a variety of HTTP attacks collected in a live environment from test web servers, as well as various archives and databases.The attacks fall into several categories, including buffer overflow, URL decoding error, and input validation error, and were directed against numerous web servers such as Microsoft IIS, Apache, Active Perl ISAPI, CERN 3.0A, etc. [6] further supplements these attacks, bolstering the dataset to include a total of 66 HTTP threats.The Shell-code attack dataset includes 11 shell- code attacks (attacks with packets containing executable code in their payload), which are also included in the Generic HTTP attack dataset.Finally, the CLET attacks were generated by [6] using the CLET polymorphic shellcode engine [14].This created 96 polymorphic shell-code attacks containing ciphered data meant to evade patternmatching based detection techniques.
Following the same procedure used to process the DARPA'99 week one data, the numeric character values contained in all HTTP packet payloads from each of the three attack datasets with lengths of at least 1400 characters were extracted using Jpcap.This provided a total of 843 attack packets, with varying numbers of packets from each attack type as is detailed in Table 2.
The payload information extracted from the DARPA and attack datasets was used to create training and testing datasets for our anomaly detection systems.For each of the five days in the DARPA dataset, 20% of the day's packets were extracted to be used for training, and the remaining 80% of the day's data were set aside to be used for testing.To simulate the network traffic in real time, anomalous packets were then sporadically inserted into both the training and testing data after an initial interval consisting of only normal traffic (50 packets for training data and 200 packets for testing data).In this way, different datasets were created with each attack type for all five days of DARPA'99 week one (See Figure 1).The total number of abnormal packets inserted into both the training and testing data was no more than 10% of all normal data for the given day with payload length 1400 characters or more.In some cases, as shown in Table 3 and Table 4, the number of abnormal packets inserted into data was less than 10% because there was not enough attack data available for that day.For the training data, 20% of the abnormal data was mixed with 20% of the normal data for each day.Likewise, 80% of the abnormal data was mixed with 80% of the normal data selected from each day for testing.For each packet, 256 1gram and 65,536 2-gram features were extracted to produce separate representations of the training and testing datasets detailed in Table 3 and Table 4.The datasets were stored in the ARFF file format used the by the open source machine learning software WEKA [15].
The payload information extracted from the DARPA and attack datasets was used to create training and testing   datasets for our anomaly detection systems.For each of the five days in the DARPA dataset, 20% of the day's packets were extracted to be used for training, and the remaining 80% of the day's data were set aside to be used for testing.To simulate the network traffic in real time, anomalous packets were then sporadically inserted into both the training and testing data after an initial interval consisting of only normal traffic (50 packets for training data and 200 packets for testing data).In this way, different datasets were created with each attack type for all five days of DARPA'99 week one (See Figure 1).The total number of abnormal packets inserted into both the training and testing data was no more than 10% of all normal data for the given day with payload length 1400 characters or more.In some cases, as shown in Tables 3 and 4, the number of abnormal packets inserted into data was less than 10% because there was not enough attack data available for that day.For the training data, 20% of the abnormal data was mixed with 20% of the normal data for each day.Likewise, 80% of the abnormal data was mixed with 80% of the normal data selected from each day for testing.For each packet, 256 1gram and 65,536 2-gram features were extracted to produce separate representations of the training and testing datasets detailed in Tables 3 and 4. The datasets were stored in the ARFF file format used the by the open source machine learning software WEKA [15].

Clustering-Based Anomaly Detection
Clustering algorithms are commonly used for anomaly detection, and are generally created for the batch environment [16].However, some batch clustering algorithms, such as DBSCAN, can be modified to process stream data.

DBSCAN
DBSCAN is a density-based clustering algorithm devel- } oped for the batch setting.The algorithm takes two userdefined parameters, epsilon (ε) and minimum points, and relies on the concepts of ε-neighborhood and core-objects.An ε-neighborhood is defined by DBSCAN as being a set of points that have a distance to another point less than the user-defined parameter ε.More specifically, given point p and dataset D, the ε-neighborhood of p is equal to: where is the Euclidean distance between points p and q [17].

 , dist p q
A core-object is defined as a set of points within an εneighborhood that contain more points than the minimum points parameter.If p is part of a core-object, DBSCAN will expand the cluster around p.
The basic structure of the algorithm is as follows: 1) DBSCAN takes the ε and minimum points parameters and then chooses a point p that has not been visited.
2) DBSCAN calculates .If the size of

 
N p  N  is greater than minimum points, DBSCAN expands a cluster around p. Otherwise, the point is considered noise.

 
p 3) DBSCAN iterates to a new un-visited point and repeats the process [18].
Although DBSCAN was originally developed for a batch environment, it has provided an inspiration for stream clustering algorithms.

DenStream
DenStream is a stream clustering algorithm based on DBSCAN with a damped window model.It expands the concept of an ε-neighborhood in DBSCAN with a fading function to maintain up-to-date information about the data stream.The fading function is defined as: where 0   represents the decay factor and t represents the time.DenStream also modifies the core-object concept of DBSCAN, creating a core-micro-cluster with three additional attributes: radius, center and weight.The radius must be less than or equal to ε, and the weight of a cluster must be greater than the user-defined parameter µ [11].The weight w, center c and radius r of a core-microcluster are more formally defined at time t, for a set of close points, with time-stamps as:     This changes the center and radius values to be [11,19]: Although the icro-cluster permits t updated dynam , it generally will not resentative view of a data stream as new points appear.To ew point p arrives in the str rge p with the closest pan or equal to the value of ε, the point is merged.

Th
p-m he model to be ically provide a rephandle this concept drift, DenStream also introduces the outlier-micro-cluster (or o-micro-cluster) and an outlier-buffer that temporarily stores o-micro-clusters and allows them to become p-micro-clusters.The operation of DenStream is as follows: Initial Step: run DBSCAN on a set of initial points to generate starting p-micro-clusters.
Online Steps, when a n eam: 1) The algorithm attempts to me micro-cluster.If the radius of the potential micro-cluster is less th 2) If the point is not merged to a p-micro-cluster, it tries to merge p with an existing o-micro-cluster.If the radius is less than ε, it is merged with the o-micro-cluster.
en if the o-micro-cluster now has a weight large enough to become its own p-micro-cluster, it is removed (4) ier-buffer. is done using the formula: from the outlier-buffer and added to the model as a pmicro-cluster.
3) If the point cannot be merged to an existing o-micro-cluster, it creates a new o-micro-cluster and gets placed in the outl After the merging phase of the DenStream algorithm, the lower weight limit is calculated for all o-micro-clusters in the outlier buffer.This where 0   represents the decay factor, t and t represent the current and starting time for th ter, and T p is the predetermined time-period.the w ckets, DenStream was modified create the DenStreamDetection algorithm, which treats hen a

2.
odel was created in two steps.he first step used the training data to find a range for such as ε and in Section 2.4.A false po ese parameters, models were generated with a range of false p verall performance.
nother approach to the detection of anomalous network aintain tatistical information about network packet payloads.4) If eight of a particular cluster is less than the lower weight limit, the o-micro-cluster can be removed from the outlier buffer.

Our DenStream-Based Detection System
To detect anomalous pa to incoming packets as points to be clustered.W packet is merged with a p-micro-cluster, it is classified as normal.Otherwise, it is sent to the outlier-buffer and classified as anomalous.The ability for o-micro-clusters to be promoted to p-micro-clusters was removed because the majority of the packets clustered to the outlier-buffer are abnormal packets.If one of these o-micro-clusters became a p-micro-cluster, the model would be tainted and therefore unable to differentiate between abnormal and normal packets.
The basic structure of DenStreamDetectionis shown in Algorithm 1.

Creation of the Detection Model
The anomaly detection m T the parameters in DenStreamDetection minimum points.Using 50 initial points, multiple Den-StreamDetection models for each day were created to find a range of optimal parameters that could be used in the testing step.We found that ε had a larger impact on the predictions than the minimum points.During the first step, different parameter ranges were identified based on day and abnormal packet type.
The second step used the testing data to make a prediction model, which was evaluated with the sensitivity and false positive rates defined sitive is a normal packet classified as abnormal.The parameters used in this step were 200 initial packets, 10 minimum points and a range of ε values specific to each day and attack type determined from the first step.Using th ositive and sensitivity rates to demonstrate o

Histogram-Based Anomaly Detection
A packets has involved the use of histograms to m s PAYL is an example of such a system [7], in which a model is created for known normal packet payloads and then compared with incoming packet payloads to determine whether or not the newly arriving packets are anomalous.Due to the evolutionary nature of streaming data, it is important that any abnormal packet detection method is able to update its normal model as concept drift occurs in the incoming data stream.With this in mind, we present a histogram-based classification method capable of modeling dynamic network traffic in real time.

Algorithm Description
The This histogram contains frequency counts from all initial packets for each possible n-gram attribute.Since we are attempting to model normal traffic, it is imperative that no abnormal packets are included when this model is created or else the model will be contaminated and detection rates will decrease.To effectively reflect the evolutionary nature of network traffic, the same fading function with decay factor λ used in DenStream is applied to the histogram after each new packet is processed.This helps to reduce the impact of outdated stream data.After the initial histogram has been built, the algorithm can begin to classify the subsequent packets.
In order to classify an incoming network packet, the algorithm builds a histogram from the newly arrived packet's payload.The histogram generated f cket is then compared with the normal model histogram (to which the fading function has been applied as each new packet comes in) by computing the Pearson correlation value between the two histograms.If the computed Pearson correlation value is above auser-defined threshold t, the packet is classified as normal; otherwise, the packet is classified as abnormal.
To account for the possibility of concept drift and shift occurring in data flows, the normal histogram model may need to be rebuilt using packets that have ar e initialization of the normal histogram model.This allows the normal model to stay current, modeling packets most recently classified as normal.In order to facilitate this rebuilding process, the algorithm maintains two queues of user-defined size containing information from previously processed normal packets.One of these queues, of size q, stores the histogram data for the previous packets, while the other, with size w, stores Pearson correlation values computed between the packets and the normal histogram model.Note that only data for packets classified as normal are included in these two queues; any packets classified as abnormal are not taken into account.If the normal histogram is to be rebuilt, a set of user-specified conditions must be met, giving the user control of the rebuilding process.When the model is rebuilt too often, the algorithm's efficiency will decrease significantly; however, if it is not rebuilt enough, accuracy will diminish.To determine when rebuilding the normal model is necessary, the algorithm calculates the number of Pearson correlation values in the stored queue that are below the user-defined threshold h.If this count is found to be of a certain value r, the normal model is rebuilt using packets stored in the histogram data queue and the queue containing previous Pearson correlation values is emptied.

Critical Parameters
Though the histog p equally important.Rather, t have the greatest effect on the algorithm's ability to detect anomalous packets.
The first of these most critical parameters is q, the size of the queue of previously processed instances used to rebuild the normal histogr all, the normal histogram model generated when the model is rebuilt will not take into account a sufficient number of previously processed packets.This results in an insufficiently robust model, causing both undesirable sensitivity values and false positive rates.
While q has a noticeable influence on the effectiveness of the histogram detection algorithm, t, which defines the Pearson correlation threshold between ins d as normal and abnormal, is undoubtedly the most important parameter.This is understandable, as t directly controls the classification of each individual instance as it is processed by the algorithm.Furthermore, t also plays Algorithm: HistogramDetection (x, w, q, t, r, h, λ) o evaluate the performance of the anomaly detection se positive rates were calulated using the following formulas: a role in controlling the rebuilding of the normal histogram model.Because parameter h specifies an interval above t, t is directly related the frequency at which the normal model histogram is rebuilt.Thus, the value of t is closely connected to the core functionality of the histogram-based detection algorithm.

Performance Metrics
T models, the sensitivity and fal c sensitivity false positive rate where TP/FN stand for the number of correctly/incurrectly classified abnormal packets, and number of correctly/incorrectly classified normal packets. .

Density-Based Detection Results
fter tuning the DenStreamDetection-based system on ed a range of ues for each day that could be used to evaluate the th a 14% false positive rat

3
A packets using 2-gram features, we discover ε val model.For every day except for Tuesday, the false positive rate was kept between 0% and 10% so that an appropriate detection rate could be found.Tuesday, however, needed the false positive limit to be heavily relaxed in order to achieve a moderate sensitivity.When testing the detection system, the false positive and sensitivity rates for the highest, middle and lowest ε values were generated for both1-gram and 2-gram feature representations.These are displayed in Table 5.The results with highest sensitivity for each virus type were then averaged to find best overall performance.
In general, the DenStreamDetection-based system was able to correctly detect most Shell-code attacks, achieving on average 91% sensitivity wi e.Similarly, Generic HTTP attacks produced 78% average sensitivity and a 13% false positive rate.CLET attacks, however, had a similar false positive rate of 14%, but a substantially lower average sensitivity of 65%.This disparity was likely due to the polymorphic nature of CLET attacks, which are designed to mimic normal network traffic.le parameters for the algo-sday exhibited the ighes detect n rat (up to 9%) whil t keepi g the alse positive r tes be w 6% A e largest ε range to achieve its results.The models utilizing both 1-gram and 2-gram featuresproduced similar results.Using the 1-gram representation for the same ε values, the syst ghtly better detection rates at the expense of higher false positive rates.Also, because the 1-gram representation has a much smaller feature space than 2-gram, the total run-time of 1-gram was significantly less.

3
The histogram-based algorithm was applied t lous In the training step, favorab rithm were approximated by performing several experiments on the training data.Since the training data included 50 initial normal packets, this value was used for x during the training step.Most critically, appropriate values of t were ascertained for the different attack types on each day, as this parameter has the greatest effect on the performance of the algorithm.Suitable values were also obtained for all other parameters during the training phase.The optimum value of q was found to be 200, as this allowed the algorithm to maintain a fairly accurate model of normal traffic while minimizing the time needed tained 200 initial normal pa ble to attain irly consistent results across all days' testing data for imal parameters.The algorithm Once appropriate parameters were identified, the testing phase began.In this step, the value 200 was assigned to x since the testing data con ckets.With the rest of the algorithm's parameters remaining static, the algorithm was tested using varying values of t in order to gauge sensitivity values at different false positive rates.This produced the results summarized in Table 6, which displays a sampling of t values used for different days and attack types with the resulting sensitivities and false positive rates.

Results Achieved
As can be seen from Table 6, we were a fa each attack type using opt performed best at detecting shell-code attacks, achieving an average of 97% sensitivity and 1% false positive rate across all shell-code attack testing datasets.Performance was slightly less desirable in detection of Generic HTTP attacks, but was nevertheless acceptable, with an average detection rate of 84% and 3% false positive rate.CLET Be s prov m st ifficult fo the histog m hm to detect.In order fo rea onable etection tes to be ob , f p ive rates generally h d t be pushed much higher than necessary for the other two attack types, to an average of 7% with optimal parameters.Despite this fact, sensitivity remained comparatively low, at an average of 61%.This difficulty detecting CLET attacks was likely due to their polymorphic nature, which also proved troublesome to the DenStream Detection system.
While results were relatively consistent across each days' worth of training data, those achieved using Thursday's testing data were markedly superior.Using optimal parameters, the histogram-based detection algorithm was able to achieve perfect detection on all three types of attacks, each with a false positive rate of 3% or less.There are several possible reasons for this exceptional performance related to the nature of Thursday's testing data as discussed in Section 3.3.Overall, the relatively high t values used indicate greater consistency in Thursday's network traffic.Due to these elevated t values, the algorithm was able to more effectively identify abnormal network packets.
The number of parameters required by the histogrambased detection algorithm necessitated fairly specific tuning for our different datasets in order to perform optimally.This was demonstrated by the algorithm's performance on 1-gram features when the same parameters used for 2-gram were applied.While the clustering-based algorithm was able to achieve comparable results from both 1-gram and 2-gram with the same parameters, the histogram-based algorithm performed poorly on 1-gram when applied with the same parameters as 2-gram.As a result, 1-gram results have not been reported for the histogram-based algorithm.

Concept Shift
he performance of our ano T heavily influenced by t affic.To demonstrate tr Pearson correlation between segments of ten packets for each day.By measuring the correlation between two consecutive segments, changes in the stream can be visualized (Figure 2), which offers a possible explanation to the results observed in Tables 5 and 6.
Monday, Wednesday, and Friday exhibit continuous concept shift, particularly during the training phase, as the calculated Pearson correlation values oscillate regularly.Therefore, a robust model for normal traffic was built in each case that responded accurately to the evolving data stream.As a result, greater sensitivity was achieved on each of these three days.Copyright © 2011 SciRes.JIS The high sensitivity values for Thursday can also be explain ase, consistent concept shift occurred, allowing Thursday's model to effectively capture the changing pattern of normal network traffic.Following the training step, the remainder of Thursday's data was relatively stable.

Conclusions
In this paper, two data stream mining techniques a stream clustering al p proach achieved moderate success with Generic HTTP and Shell-code attacks but had a higher average false positive rate.Second, a stream adaptation of the relative frequency histogram approach found in [7] was created using Pearson correlation to detect anomalies.
Though the histogram-based approach achieved moderately better results, it required more fine-tuning because of the number of parameters used.In contrast, generalization of the clustering algorithm was er to hieve since it uses fewer parameters.This was evidenced by the ability of the clustering algorithm to perform effectively on both 1-gram and 2-gram features with the same parameters, while the histogram algorithm required specific parameter tuning for each feature type.
Lastly, to better explain the performance differences between certain days, we analyzed the Pearson correlation between consecutive segments of 10 packets.By plotting these values on a graph, concept drift and shift ere visualized, and clear variations were observed between days.The location and frequency of concept shift and drift in the data streams, especially within the training phase, provided an account for the observed changes in performance.

Figure 1 .
Figure 1.Testing and training data stream diagram.

2 CF 2 CF
distance between the point p c.Because DenStream operates in a stream environment, e-micro-cl e this, a potential core-microcl the cor usters need to change dynamically as time passes.To facilitat uster or p-micro-cluster is introduced.P-micro-clusters are similar to core-micro-clusters, except they differ in that the center and radius values are based on the weighted sum and squared sum of the points (1CF and ).Also, the weight must be greater than or equal to  where  defines the threshold between icroclusters and outliers(described in the next p raph) are calculated using the formulas: stogram-based detection algorithm provides a simle method for classification of network traffic.The alrithm 2, creates a histo-c 0 e o-micro-clus-

mDetection System Results 2 -
gram(1-gram) TN/FP are the Sensitivity measured how well the model detected abnormal packets, and false positive rate indicated the percentage of false alarms generated.

Figure 2 .
Figure 2. Pearson correlation curves for DARPA'99 week 1.Tuesday demonstrates a ve starts with slight shift, but remains relatively stable th Research Institute at oughton College for providing funding for our rerdisci, G. Gu and W. Lee, "Using an Ensembl One-Class svm Classifiers to Harden Payload-Based ction Systems," ICDM'06: Proceedings of nation Conference on Data Mining, Hong

Histogram Detection System Results 2-gram
model to be rebuilt.Also, 30 was generally used for w, with r valued at 10 and h at 0.2.These parameters effectively limited the frequency of rebuilding the normal histogram model while still allowing the algorithm to handle concept drift in the data.A λ value of 0.01 was found to work sufficiently well, as this decay factor helped to better maintain an up-to-date normal model for normal traffic.