A Multi-Stage Network Anomaly Detection Method for Improving Efficiency and Accuracy ()
1. Introduction
In recent years, intrusions such as worms and denial of service attack have become a major threat to the Internet. In particular, novel intrusions such as novel worms and zero-day attacks are increasing and are responsible for a big damage to the Internet. For detecting intrusions, Network Intrusion Detection Systems (NIDSs) have gained attention. NIDSs are classified into misuse detection system and anomaly detection system.
In misuse detection systems such as Snort [1], intrusions are detected by matching signatures which are prepared manually in advance. They are highly popular in network security because they exhibit higher detection accuracy and generate fewer false positives for known intrusions than anomaly detection systems. However, developing signatures is cumbersome and time-consuming task because they have to be made by security experts manually. Therefore, novel intrusions can cause a significant damage to the Internet before signatures are developed.
On the other hand, anomaly detection systems such as NIDES [2] and ADAM [3] can detect unknown intrusions. This is because these methods detect intrusions based on the deviation from the normal behavior, and thus do not require a pre-hand knowledge of intrusions.
However, these methods tend to generate more false positives than signature base IDSs. Although a lot of researchers carried out to increase the detection accuracy, still higher detection accuracy is demanded. Therefore, we focus our research on anomaly detection systems.
In anomaly detection systems, network traffic is analyzed using observation units such as timeslot and flow. A timeslot-based detection has an advantage of being able to detect network anomaly states effectively. On the other hand, the flow-based analysis is capable of examining each communication in a more detail form. Our group has proposed a combination of timeslot-based and flow-based detections and shown its effectiveness [4]. However, in a flow-based analysis, a large number of buffers have to be prepared. Analyzing all flows of network traffic is not realistic, and the buffer size can be vulnerability to Denial of Service (DoS) attacks because all flow analysis can result in a buffer overflow.
In this paper, we propose a high accuracy multi-stage anomaly detection system which can reduce the number of flows necessary to be analyzed. The proposed system consists of two detection stages. The first stage is a timeslot-based detector which picks up flows need to be analyzed by flow-based detector in detail. It then inspects only these suspicious flows in the second stage, thus, computational load and buffer size to analyze flows can be reduced.
The remainder of this paper is organized as follows. Section 2 explains timeslot-based and flow-based analyses, and mentions issues in a combination of these analyses. In Section 3, we proposed a multi-stage anomaly detection system. Evaluation of the proposed system is presented in Section 4. Finally, Section 5 concludes this paper.
2. Combination of Timeslot-Based and Flow-Based Analyses
Anomaly detection systems generally analyze traffic in observation units such as timeslots and flows. In this section, we explain these units for the intrusion detection and introduce a conventional method which combines the two detectors. Furthermore, issues in the conventional method are also presented.
2.1. Timeslot-Based Analysis
Anomaly detection often uses timeslot-based analysis [4-6]. In this method, the overall traffic is separated into timeslots of fixed length and its features, which are numerical values representing the network state, are extracted from traffic in the timeslot. It has an advantage of low buffer storage since this analysis releases buffers after each timeslot. However, it is difficult for this method to specify anomalous communication flows.
2.2. Flow-Based Analysis
A flow is defined as a set of packets which have the same values for the following three header fields.
• Protocol (TCP/UDP)
• Source/Destination address pair
• Source/Destination port pair A TCP flow ends with FIN or RST flags and UDP flows are terminated by time-out
A flow is often used in anomaly detection [4,7,8]. A flow-based analysis method can analyze each bidirectional communication in detail and can specify each anomalous communication. However, in this analysis, buffers must be prepared for every flow. The number of buffers to be prepared lineally increases with as increase in the number of flows. Thus, this method possesses a risk of buffer overflow. Therefore, storage of buffers is a bottleneck in the flow-based analysis and vulnerability to DoS attacks.
2.3. A Conventional Combination Method
Our research group has proposed a combined system using the timeslot-based and the flow-based analyses in parallel [4]. Figure 1(a) shows the overview of the conventional system, which we term as a parallel system.
Network traffic is inputted to both the timeslot-based and the flow-based detectors, and is analyzed by each detector. A combination of timeslot-based and flow-based detectors can detect intrusions effectively by taking advantage of the merits possessed by both of these methods. Therefore, the combination system is highly accurate in anomaly detection and [4] shows the effectiveness of the parallel system through some experiments using DARPA data set [9].
However, it is still necessary to address the problem of large buffer storage in the flow-based analysis. For reducing the amount of data to be analyzed by flow-based analysis, packet sampling [10-12] and setting short timeouts [13] have been proposed. However, by using the former, it is difficult to observe flows which consist of only few packets, and thus there is a high chance of missing important packets during detection. Since novel worms tend to be few packets in order to spread as fast as possible [14], such worms are difficult to be sampled. In the latter case, since long traffic flows will be split up if its interval of arrival time of packets exceeds the flow timeout, the short timeouts causes increasing the number of flows and declining efficiency and accuracy [11]. As a result, we consider that these approaches suffer from lack of information for detecting anomalous events and exhibit low detection accuracy.
Since packet sampling and setting short timeout diminish data of each flow without any regards for evaluating anomaly, it may result in lack of information needed to detect anomalous flows. For avoiding lack of information, not data of each flow but the number of flows should be reduced with appropriate criteria.
3. A Multi-Stage Anomaly Detection System
In this section, we propose a multi-stage network anomaly detection system. It uses fewer amount of buffer, but yet detects intrusions with high accuracy.
3.1. Outline
Figure 1(b) shows the overview of the proposed multistage