Information Theory and Data-Mining Techniques for Network Traffic Profiling for Intrusion Detection

In this paper, information theory and data mining techniques to extract knowledge of network traffic behavior for packet-level and flow-level are proposed, which can be applied for traffic profiling in intrusion detection systems. The empirical analysis of our profiles through the rate of remaining features at the packet-level, as well as the three-dimensional spaces of entropy at the flow-level, provide a fast detection of intrusions caused by port scanning and worm attacks.


Introduction
Network Intrusion Detection Systems [1], or NIDSs, have become an important component to detecting attacks against information systems.However, they offer only a limited defense.For instance, a signature-based NIDS (S-NIDS) monitors packets on the network and compares them against a database of signatures or attributes from known malicious threats.A weakness of this type of NIDS is that there will always be a lag between a new threat being discovered and the signature for detecting that threat being applied to the NIDS.During that lag time the NIDS would be unable to detect the new threat.
A second type of NIDS is the anomaly-based NIDS (A-NIDS), which monitors network traffic and compares it against an established baseline, [2].The baseline helps to identify what is normal behavior for that network.If a deviation from the established baseline reaches a specified threshold, an alarm is generated.Therefore, anomaly detection techniques have the potential to detect new and unforeseen types of attacks.Traditional anomaly based IDSs, employ algorithms that focus primarily on changes in the traffic volume at specific points on the network, and promptly alert the operator of a sudden increase.However, such systems can be evaded through sophisticated attacks that focus on compromising significant hosts, causing them a collapse of memory or CPU and maintaining a level of traffic within the normal threshold.
Recently, a new generation of anomaly based IDSs have emerged, which focus on gaining knowledge in the structure and composition of the traffic and not just its volume.Such systems are based on the fact that the malicious activities affect the natural randomness of the network, e.g., they change significantly the entropy of the network [4] [5].The composition of traffic is related to its probability distribution, and can be characterized by its entropy; a malicious activity changes that composition and the shape of the distribution and therefore its entropy.By means of entropy measures to a set of traffic features, we can establish the profiles of normal activity of the network and determine intrusions to the system.This paper presents an analysis at the packet and the flow level on traces obtained through measurements conducted in a campus network under real attacks of the Blaster [6] and Sasser [7] worms, as well as a port scan attack to the proxy server of that network.The captured traces during a week of normal operation, helped to develop a profile of normal behavior that is useful to be compared to attack conditions.
The paper is organized as follows.In Section 2, we present our profiling approach and the context of this paper.Section 3 describes the test environment; Sections 4 and 5 explain the methodology: the rate of remnant items and spaces of entropy and results.Section 6 gives concluding remarks.

Profiling Approach
We propose two methods for the creation of profiles based on entropy.The analysis applies primarily to the packet-level for the method of the rate of remnant elements and to the flow-level for the spaces of entropy.Initially, there is a set of captured traffic traces corresponding to five days in typical working hours in an academic LAN.The traces have been inspected to be considered free of anomalies, so they may serve as a baseline.
We use traffic features to build the profiles.A traffic feature is a field in a header of a packet (at the packet level) or a field in a five-tuple (at the flow level), respectively.Four fields will be used: source address (srcIP), destination address (dstIP), source port (srcPrt), and destination port (dstPrt).
After the feature extraction, an essential part in the builder profile block is the measurement of entropy.For a discrete set of symbols { 1 2 3 , , , , n a a a a  } with probabilities i p , 1, 2, , i n =  , the entropy of the discrete distribution of a random variable X associated, is a measure of randomness in the set of symbols and represented as ( ) ( ) The relative uncertainty (RU) provides a measure of variety or uniformity that is independent of the sample size.For a random variable XRU is defined as, [3], ( ) 1 RU X ≈ means that observed values of X are closer to being uniformly distributed, thus less distinguishable from each other, whereas ( ) 0 RU X ≈ indicates that the distribution is highly concentrated.

Experimental Platform
The worm propagation and port scanning were carried out on academic LAN which is subdivided into four subnets (192.168.

Data-Set and Tools
The benign traffic traces in typical work hours for a period of five days were labeled with a number from one to five.The anomalous traffic for port scanning attack was labeled as 6-P1.Blaster and Sasser worm attacks were labeled as 6-P2 and 6-P4, respectively.The data-set was collected by a network sniffer tool based on libpcap library used by tcpdump, [8].All traces were cleaned to remove spurious data using plab, a platform for packet capture and analysis, [9].Traces were split into segments using tracesplit which is a tool that belongs to Libtrace, [10].The traffic-files in ASCII format suitable for MATLAB processing were created with ipsumdump, [11].
The flow generation was done with flow analyzer, a tool based on perl and developed by us.

Rate of Remnant Elements
We define a traffic trace χ of a duration t D seconds with a total of N packets, χ is divided into M non-overlapping slots of d D t t M = seconds each one.The i-th slot has W i packets for 1, 2, , i M =  .In each i-slot, four features are extracted that we associate with a value of r, namely r = 1 for source IP address, r = 2 for destination IP address, r = 3 for source TCP port, and r = 4 for destination TCP port.Let S be a finite sequence of r = 1 values or IP source addresses in a slot-i.This sequence with elements in an alphabet set A, is a function from ), and the length of S is W i .The elements of S belong to an alphabet A with cardinality n A = .From A an ordered set A contains the n-source IP addresses in decreasing order sorted by frequency.With the associated frequencies of A, we define a probability mass function (pmf) A is transferred to an iterative process P to create l subsets of . This family of l subsets is shown in Equations ( 4)-( 6) and holds , , , ..., A a a a a When in a k-iteration, the relative uncertainty of a partial pmf reaches a threshold β, i.e., ( ) β > , we say that the iterative process P reached its latest iteration, and hence, k = l.An estimator of relative uncertainty for a discrete random variable r i X in the k-iteration is defined in terms of its partial pmf as: Selecting a 1 β ≈ , the resultant subset ( ) is closer to being uniformly distributed.Then, for a given β, and a number l of iterations carried out, it is possible to calculate the remnant r i R for a subset ( ) , o l A .Genera- lizing this for an i-slot and a r-traffic feature we have the rate of remnant elements: for 1 In other words, r i R is the cardinality of the subset ( ) , o l A .We found that this feature under normal condi- tions presents regularities that allow creating behavioral traffic profiles.Table 1 summarizes the r i R behavior with β = 0.95 for our data-set.
Through of mean, variance, the intensity factor ( 2 σ µ ), and maximum value we can define a threshold for normal behavior of r i R .For instance, by averaging the means of r i R and its maximum values during benign traffic, we can define an average threshold of 28.5 with a maximum of 114.8 units.We denoted these thresholds for each r by ( ) (   An anomaly related with a port scan attack directed to the proxy server was possible to detect it since the first slot that appeared (i.e.i = 2, 3) in trace 6-P1.The attack was carried out across a large number of TCP packets with source addresses supplanted.The growth of 1 i R is possible to observe in Figure 2(a) and is far away from ( ) ( There is a significant in- crease in the amount of remnants for r = 3 during the Blaster and Sasser Worm propagation.It is important to note that the anomaly detection is done from the earliest slots that the intrusion appears.

Three-Dimensional Spaces of Entropy
The construction of a space of entropy is carried out at flow level, and through these spaces is possible to create profiles of behavior for the traffic of a network.Four three-dimensional spaces are generated for each one of the features extracted from the flows.We define a traffic trace χ of a duration t D seconds that is divided into M non-overlapping slots of d D t t M = seconds each one.In an i-slot i K flows are generated with a given inter-flow gap (IFG).All the flows for each slot are stored on indexed text files.The traffic features used in this technique are the flow's fields and are identified as r = 1 for source IP address, r = 2 for destination IP address, r = 3 for source TCP port, and r = 4 for destination TCP port.
Once that flows in an i-slot are generated, they should be clustering according to a r-flow feature.For instance, with a cluster key or pivot r = 1 the flows are aggregated into those flows that share the same source IP address.The number of clusters depends on


is the alphabet set of all source IP addresses seen in the slot i.Thus, each cluster has flows with the same source IP address, but the rest of fields or features (r = 2, 3, 4) have freedom of variation.In this context, we can estimate the entropy for each r = 2, 3, 4 of each cluster.If we join these three values and associate them with a coordinate, we have a cloud of data points in a 3-D Euclidean space, where the axis are ( ) are plotted in the 3D-space.
When we apply this procedure to the rest of cluster keys and all slots, we get four spaces of entropy.
Figure 3 shows the spaces of entropy for three.First, in Figure 3(a), we see the point pattern for Trace-1, which corresponds to normal traffic conditions being typical for Traces 1 -5.The characterization of the spaces of entropy represented by the vector 3 r ∈ X  for a cluster key r was realized applying initially a technique of multivariable analysis, the Principal Component Analysis.PCA provides a roadmap for how reduce a complex data-set to a lower dimension , 3 to reveal the sometimes hidden, simplified structure that often underlie it.PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component, PCA 1), the second greatest variance on the second coordinate, and so on, [12].
To obtain accurate analysis estimation, tools like Kernel Density Estimation (KDE) may be useful.This analysis was applied on data-points at the slot level on the PCA 1 by using a Gaussian kernel of 200 points, with a bandwidth of , given by the Silverman's criteria, where J is the number of observations and σ is the standard deviation of the set of observations.KDE shows that the traffic slots have Gaussian bimodality behavior in its pdf.Each mode was labeled as principal mode and far mode respectively, as shown in Figure 4.This procedure was applied to i-slot level for a given r-, feature.The transformed data points are denoted by r i Z .Densities of r i Z under normal conditions presents bimodality, the second mode (far mode) is situated on the left side of the main mode, at an average of minus five units.On the other hand, the empirically calculated average variance of r i Z for benign traffic is three units.During analysis of the trace P1 was discovered that slots two and three had a variance of 0.32 and 0.74 units, respectively.These values are anomalous with regard to the threshold of three units.Thus, an anomaly (Figure 5(a)) was early detected in Trace 6-P1.The analysis of these traffic slots with Wireshark, was able to identify a port scan attack.
In another case,   In all three cases studied, the attack leads to changes in traffic patterns that are detected from the first slots containing the malicious traffic.

Conclusions and Future Work
The generation of behavioral profiles based on entropy offers an effective support for the Intrusion Detection Systems.The results of this study in a campus network show that under the Blaster and Sasser worm attacks as well as the port scanning, an A-NIDS employing profiles generated by the Rate of remnant elements or Three-Dimensional Spaces of Entropy methodologies can provide a rapid response detecting deviations from an established baseline in the early slots that the attack appears.
As a future work, we will investigate the effect on variation of the slot duration t d , smaller values of slot duration represent faster response times, but also represent a smaller data set where to obtain representative traffic features, finding the optimum value is an important objective design.

Figure 1 shows the patterns 1 i R and 2 iR
for benign traffic in Trace 5 and their variation are inside of stan- dard behavior mentioned above.

Figure 1 .
Figure 1.Rate of remnants for (a) srcIP and (b) dstIP for standard traffic in Trace 5 in typical work hours.(t d = 60 s, and β = 0.95).

R
patterns during worms attacks are presented in Figure 2(b) and Figure 2(c).

Figure 3 (
b) and Figure 3(c) show a marked difference with regard to benign traffic, since the data points move away from positions typically observed.

Figure 4 (
a) shows that the densities of PCA 1 in benign traffic traces present a clear regularity in their shapes.This pattern of behavior changes drastically for anomalous traffic traces, see Figure 4(b) and Figure 4(c).

Figure 5 (
b) shows the first three anomalous slots, where far mode moves away from the typical value.The far mode displace to −9, −11 and −13 units, representing an anomaly with regard to the threshold of −5.This anomalous behavior in the far mode was caused by the spread of the Blaster worm.Similarly, the Sasser worm attack caused anomalous values in the variance of the transformed data-points, these anomalous values were close to 0.5 units.

Figure 5 .
Figure 5. Anomaly Detection on PCA1 (a) Anomaly caused by port scan detected in first slot in Trace 6-P1, (b) Anomaly caused by the Blaster worm that deviate the far mode out of the typical position.

Table 1 .
Values of mean, variance, and intensity factor for the rate of remnants.