Real-Time Timing Channel Detection in a Software-Defined Networking Virtual Environment

Despite extensive research, timing channels (TCs) are still known as a principal category of threats that aim to leak and transmit information by perturbing the timing or ordering of events. Existing TC detection approaches use either signature-based approaches to detect known TCs or anomalybased approach by modeling the legitimate network traffic in order to detect unknown TCs. Unfortunately, in a software-defined networking (SDN) environment, most existing TC detection approaches would fail due to factors such as volatile network traffic, imprecise timekeeping mechanisms, and dynamic network topology. Furthermore, stealthy TCs can be designed to mimic the legitimate traffic pattern and thus evade anomalous TC detection. In this paper, we overcome the above challenges by presenting a novel framework that harnesses the advantages of elastic resources in the cloud. In particular, our framework dynamically configures SDN to enable/disable differential analysis against outbound network flows of different virtual machines (VMs). Our framework is tightly coupled with a new metric that first decomposes the timing data of network flows into a number of using the discrete wavelet-based multi-resolution transform (DWMT). It then applies the Kullback-Leibler divergence (KLD) to measure the variance among flow pairs. The appealing feature of our approach is that, compared with the existing anomaly detection approaches, it can detect most existing and some new stealthy TCs without legitimate traffic for modeling, even with the presence of noise and imprecise timekeeping mechanism in an SDN virtual environment. We implement our framework as a prototype system, OBSERVER, which can be dynamically deployed in an SDN environment. Empirical evaluation shows that our approach can efficiently detect TCs with a higher detection rate, lower latency, and negligible performance overhead compared to existing approaches.


Introduction
The widespread deployment of firewalls and other perimeter defenses to protect information in enterprise information systems in the past decade has raised the bar for malicious outside adversary breaching a well-protected network.However, the same enterprise can still be easily subverted by malicious insiders, who can potentially exfiltrate inside secrete through a special communication channel, namely covert channels (CC) [1]- [4].A successful CC can exfiltrate inside secrete by modifying a "storage location" (namely storage channel or SC) [5], or manipulating timing or ordering of events (namely timing channel or TC) [5], without triggering any alert in a well-protected network [6].With the increasing number of TC design schemes [7] [8], defending against TCs is important and quite challenging.
Most TC detection approaches detect TCs by using either signature-based approaches to detect known TCs [9] or anomaly-based approaches by modeling legitimate network traffic in order to detect unknown TCs [5] [6] [10] [11].The existing detection approaches have proven to be efficient in a network environment, where actual computers are physically connected and the network topology is relatively stable.However, most of the TC detection schemes face at least two challenges when the problem space is moved to an SDN virtual environment [12].First, compared to the traditional network environment, where legitimate traffic used for modeling is readily available, traffic is hard to be collected due to the versatile VMs, virtual services, and network configuration.Moreover, since VMs use emulated CPU clock which is much less precise than the actual clock [13] [14], more noise exists in the network flows for such environments.Second, most intrusion detection systems (IDSs) are designed to detect TCs, whose intention is to transmit data faster with a higher bandwidth [6] [9] [11] [15].Realizing this fact, the adversary, however, can design stealthy TCs, statistically indistinguishable from legitimate network flows, so as to deliberately evade detection.
In this paper, we propose a novel framework by taking the advantages of volatile VMs and dynamic configuration of SDN in a cloud infrastructure.Our framework can be dynamically enabled or disabled from the TC detection mode.Since our framework is particularly designed to detect TCs that perturb inter-packet delays (IPDs) of network flows, a new metric that performs differential analysis against outbound flows from different VMs is presented as well.In particular, the metric first decomposes the timing information of flows into different scales through discrete wavelet-based multi-resolution transform (DWMT).Then, a Kullback-Leibler divergence-based (KLD) metric is developed to measure the variance between flow pairs.The appealing feature of our approach is that it can detect most existing and stealthy TCs without legitimate traffic, even with the presence of noise and imprecise timekeeping mechanism in an SDN virtual environment.We implement our framework as a prototype system, OBSERVER, which can be dynamically deployed in an SDN cloud infrastructure.Our empirical experiments show that, compared with the existing TC detection approaches, OBSERVER can efficiently detect TCs with a higher detection rate, lower latency, and negligible performance overhead compared to existing methods.In summary, we have made the following contributions to this paper: 1) We have studied the challenges of detecting TCs in a networked virtual environment, particularly one in which legitimate traffic is unavailable and imprecise timekeeping mechanism is used.To counter the above challenges, we advance here a framework that leverages the spare VMs and the dynamic configuration of SDN to detect the TCs that perturb inter-packet delays (IPDs).The framework can be easily enabled or disabled via SDN-based instructions, replicate network packets, and receive timing statistics of outbound flows from VMs under comparison.
2) We have designed a metric that is both resilient to noise and sensitive to stealthy TCs.The metric is closely coupled with the proposed framework.It is a wavelet-based metric which can quantitatively calculate the distance among different outbound flows from VMs and thus detect a broad spectrum of timing channels.
3) We have implemented the novel detection framework, the metric, and empirically evaluated them against a number of TC attacks.To prove better efficiency, we compare our metric with other well-known metrics, which are designed for the physical network environment.Our evaluate shows that our approach can efficiently detect TCs with a higher detection rate, lower latency, and negligible performance overhead.
The remainder of this paper is structured as follows.Section 2 briefly introduces the background and the chal-lenges of detecting TCs in an SDN virtual network.Section 3 reviews the related work.Section 4 presents the design of the TC detection framework.The key technical issues such as the threat model, definition and notations, and the detection metric are presented in this section.Section 5 introduces the design and implementation of our proposed framework.Section 6 presents the empirical evaluation results of detecting various TCs.Section 7 concludes the paper, addresses the possible improvements for OBSERVER, and suggests future research directions.

Background
In this section, we first introduce the threats of covert timing channel and the challenges involved to detect it in an SND virtual environment.Then, we present the effect of time drifting in a virtualized environment, which exacerbate the difficulties of anomaly detection.

Covert Timing Channel and Its Detection Challenges
A covert timing channel is a secrete communication channel that exfiltrates information from the compromised internal host to the external colluder and therefore violates the security policy [16].There are two types of TCs: active channel (AC) and passive channel (PC).AC refers to the covert channel that generates additional traffic along with the existing traffic to transmit information, while PC refer to covert channel that manipulates the existing traffic and does not generate additional traffic [5].In this paper, we only focus on the PCs that manipulate the inter-packet delays (IPDs) of a network flow [17]- [21].The primary challenge of detecting TCs is that the statistics of TC flows are so close to those of legitimate flow that it is hard to detect them through traditional statistic tests.Figure 1 illustrates the histogram and empirical cumulative distributions (ECD) of the inter-packet delays for a legitimate and a TC network flow sample, namely JitterBug [6] (sample size = 300).The distribution and ECDs of these two samples are very close.The statistics, such as means and standard deviations (or stdev), are also very similar as shown in Table 1.Low detection and high false positive rate are expected if one were to simply apply standard statistical-based detection metrics.

Time Drifting in Virtual Machines
Most existing TC detection approaches assume that the timekeeping mechanism in the hosts is accurate.This assumption might be true in a physical network environment, but might not hold in an SDN virtual environment for at least two reasons.First, unlike a physical machine that can directly access the physical CPU clock, a virtual machine accesses the CPU clock through either emulated timer devices or virtual clock [13], which makes accurate timekeeping impossible [14].This fact affects all applications that access the virtual clock, including the malware that generates the TC.The noise introduced by the virtual clock, namely time drifting, has been mentioned in the literature [14].constantly reverted to a previous snapshot, which is a saved state of data and hardware configuration of a running virtual machine [23].In addition, since a VM might be configured to run multi-booting systems, accessed by different users, and/or for different purposes, the network traffic might demonstrate different statistic patterns due to different "pay-as-you-go" services.Moreover, the topology of the SDN might be subject to change due to load-balancing or service migration [24].Thus, it might be challenging to obtain legitimate traffic from the cloud provider.These factors cause legitimate traffic of a benign VM to reveal different statistics in time across its life-cycle, which makes the legitimate traffic for modeling useless.In either case, the previous anomaly detection approaches fail.

Related Work
The design of TCs has been the subject of recent research.For example, Berk et al. [10] implemented a simple binary covert timing channel based on the Arimoto-Blahut algorithm, which computes the input distribution that maximizes the channel capacity [25].Cabuk et al. developed the first IP TC, which we refer to as IPTC [11] and TRTC [9], which is a more advanced traffic replay TC.Shah et al. [6] developed a keyboard device, called Jit-terBug, that slowly leaks user typed information over the Internet.Giffin et al. [26] showed that although not a TC, low-order bits of the TCP timestamp can be exploited to create a TC due to the shared statistical properties of timestamps and packet timing.Another application is to apply TC to trace suspicious traffic.For example, Wang et al. [17] took advantage of well-designed inter-packet delays, namely watermark, to trace VOIP traffic [18] [27].They also utilize watermarked network traffic to trace-back bot-master through TC traffic [15].Gianvecchio et al. [7] and Liu et al. [8] designed model-based covert channel encoding schemes that seek to achieve undetectability and robustness at the same time.Recent research of this field focuses on detecting covert channel in a multi-tenant virtualization environment, in which hostile tenants could leverage various covert channels to exfiltrate sensitive information from the victim, who shares the same physical machine [4] [28] [29].While this line of research turns out to be relevant to ours, their primary focus is to design high bandwidth cross-VM covert channels, rather than detect them.
A number of TCs detection methods have also been developed.Peng et al. [30] showed that the Kolmogorov-Smirnov test is an effective way to detect TCs that manipulate IPDs.Cabuk et al. [9] investigated a regularitybased approach of detecting TCs.They also developed a metric, namely  -similarity, to measure the proportion of similar inter-packet delays.The limitation of the  -similarity metric is that it only targets to detect a particu- lar TC, namely IPTC.Therefore, it is not general enough to detect other TCs.Berk et al. [10] employed a simple mean-max ratio test to detect binary or multi-symbol TCs.However, the mean-max ratio test assumes that the legitimate IPDs follow the normal distribution, which is often not true for real-world traffic.Gianvecchio et al. [5] investigated an entropy-based approach to detecting TCs, and they achieved good results of detection.All these detection schemes require legitimate network traffic to be available for modeling, and thus cannot be applied in a networked virtual environment, whose network traffic is volatile.Moreover, the model of legitimate network traffic has to be constructed off-line and incurs significant performance overhead, and thus makes realtime TC detection almost impossible.
Designed for a different purpose, Jing et al. proposed a wavelet-based approach to measure the time distortion of low-latency anonymous network [31] (or anonymity network [32]).Since their timing distortion metric is particularly designed to deal with the issues of timing distortion of IPDs, which are caused by anonymous network (e.g., Tor), such as flow mixing, merging, adding chaff, and packet dropping; it cannot be directly used to detect stealthy TCs, which demonstrate more subtle timing characteristics than the timing distortion of IPDs by anonymous network.Askarov et al. [33] [34] proposed online timing channels prevention mechanisms that mitigate information leakage.Although their approaches were proven to be efficient to identify the upper/lower bounds the information leakage as a function of elapsed time, how to detect timing channels is still an open problem that has not been addressed.The online prevention mechanisms without considering which network flow contains covert timing channel will significantly affect throughput and the response time of services, such as video streaming services which require a minimal transmission delay.
In our prior work [35] [36], we presented preliminary design for a TC detection system, which only considers TC detection in a simple networked virtual environment.Unlike existing approaches that take extra efforts to collect legitimate network traffic to construct the model, our approach did not require any legacy network traffic, but rather, the detection of TC relied on the differential analysis between multiple parallel outbound flows.Moreo-ver, this approach has proven to be more efficient and robust in a networked virtual environment.This paper extends the preliminary results reported in our prior work as follows.First, we show that the system design given in [35] [36] lacks as it does not consider the timing to start and stop the TC detection system, which might waste the resources (e.g., the VMs) in a cloud infrastructure and could limit the applicability of its real-world deployment.Therefore, we not only address more realistic assumptions in the threat model, but also redesign our earlier system so that it is applicable in an SDN virtual environment.In our current design, the TC detection is initiated by the heuristic engine and terminated by the SDN controller, whose performance overhead is less than our previous implementation and easy to be automated.Furthermore, while our earlier system was designed and implemented in C and divert socket, our current design leverages the existing features of OpenFlow [37], which makes the TC detection system extremely easy to be set up, deployed, and upgraded.The SDN controller can enable and disable connections between virtual machines faster than our previous implementation and with minimal overhead.Second, to show the effectiveness of TC detection, a set of new experiments has been performed.For instance, we compare the true positive and false positive rates of our approach to those of existing TC detection approaches.Our results show that our approach outperforms existing approaches.Moreover, we also justify the choice of parameters for our WBD metric.Compared to our prior work, the current design and implementation of our TC detection system is not only readily deployable in a realistic SDN virtual network; but also much more efficient for TC detection.Finally, we discuss the limitations of our current approach, address the possible improvements, and suggest several future research directions.

Timing Channel Detection Framework
In this section, we first present the threat model of timing channel detection in an SND virtual environment.Then, we model our framework by providing notation and definitions.After that, we model the TC detection problem and formulate the metric that quantifies the divergence between network flows.

The Threat Model
In our threat model, we assume that the adversary launches insider attacks [38] [39] from within the well-protected enterprise, harvests secrete or confidential data from the compromised VMs, and transmits them to the outsider colluders.To maintain stealthy communication without triggering any security alert by the firewall or intrusion detection systems, the adversary can encode the data as passive timing channels (PTCs) that manipulate inter-packet delays (IPDs) between network packets.Since most anti-virus software, intrusion detection systems, and firewalls were not designed to detect TCs, they can easily be evaded in such a way that inside information could be exfiltrated.Although some malware and intrusion detection systems can be installed inside a VM, they can easily be turned off by the adversary, if the adversary gains the root privilege.Furthermore, since the VMs might be under the full control of the adversary, their malicious behavior can only be observed from the outside and should be therefore treated as black boxes.
We assume the cloud infrastructure reserves a large VM repository that handles a soaring number of user requests.This condition can easily be satisfied in an SDN or a Network Function Virtualization (NFV) [40] environment.An SDN controller can be configured to choose the VMs not in use or vacant VMs during detection and then revert to their original state after usage.Although it might be expensive to use vacant VMs during the detection process, we argue that it is feasible and practical to satisfy this requirement for at least two reasons.First, the TC detection system will only be enabled when it is needed: it will be enabled if an internal VM tries to establish a connection with an external unknown IP address or service and the existing IDS detect nothing malicious from the outbound traffic.It only samples a small amount of packets for analysis and raises the alert if any abnormality has been detected.Therefore, the usage of the vacant VMs is limited to a small percentage of the outbound traffic.

Notations and Definitions
Given the same network inbound flow I, the problem of measuring the timing distance between two outbound flows, namely O 1 and O 2 , can be formulated as follows: The inbound flow I contains K packets ,1 , , ,  Definition 1.The bit rate ( ) i R C of a covert channel C i is defined as the number of bits that are conveyed per unit of time.Given two covert channels, C 1 and C 2 , if ( ) ( ) , we say C i is stealthier than C j or C j is more aggressive than C i .

Measuring Timing Distance between Flows
By characterizing the timing patterns of the network flows, it is possible to quantitatively measure the timing distance between network flows.To effectively perform the measurement, we leverage discrete wavelet-based multi-resolution transform (DWMT) [41].The DWMT, which has been widely used in signal processing [42] and anomaly detection [31], has at least three prominent features.First, DWMT provides multi-resolution analysis, which allows to look at the sequence of data at different scales.Second, DWMT allows feature localization, i.e., it allows to know the characteristics of the signal and approximately where in time they occur.Finally, DWMT supports online analysis, that is, one can compare the difference between the two flows online.
The DWMT takes a sequence of data as input and transforms that sequence into a number of wavelet coefficients sequences.Specifically, the l level DWMT takes a sequence of IPDs and transforms the sequence into 1) I wavelet detailed coefficient vectors at different scales (CD i , where I ≤ i ≤ l) and 2) a low-resolution approximate vector (CA l ) 1 .For the j th segment of O i , the wavelet detailed coefficients vector at scale I can be represented as: where is the number of wavelet coefficients at scale j, and , 4 illustrates a particular DWMT, namely the Harr wavelet [43], on a IPDs sequence (sample size = 300).The Harr wavelet shown uses five resolution levels to yield one approximate coefficient vector a 5 and five detailed coefficient vectors d i at different scales (1 ≤ i ≤ 5).
The design goals of our wavelet-based distance (WBD) are three-fold: First, we expect that the WBD between two legitimate flows is small.Second, we expect that the WBD between a legitimate flow and a TC flow is detectably different.Third, the WBD should be able to differentiate the more aggressive TC and the stealthier TC.
To achieve these goals, we define three derived vectors based on the coefficient vector ( ) , , V i j l at scale ( ) 0 l l ≥ : the intra-flow vector (intraFV), the inter-flow vector (interFV), and the Kullback-Leibler divergence (KLD) vector.We first define , , , − , which reflects the fluctuating characteristics between adjacent coefficients within a coefficient vector at scale j of one flow O i .Then, we define ( ) intraFV , j l as the Euclidean distance [44] dist between ( ) intra 1, , j l and ( ) intra 2, , j l for the j th segment of O 1 and O 2 , in Equation ( 2): Similarly, we define the as the Euclidean distance between coefficient vectors ( ) 1, , V j l and ( ) 2, , V j l , which characterizes the deviation between two wavelet coefficients for the same segment j at the same scale j.The Kullback-Leibler divergence (KLD) has been used to measure the distance between two probability distributions ( ) 1 In this paper, we interchangeably use CA j and CD 0 .To calculate the KLD between two wavelet coefficient vectors at the j th scale, it is necessary to obtain the probability distribution of ( ) , , , , , , , which contains numeric coefficients into a vector of symbols ( ) ( ) , iff. 1 ,1 To facilitate effective conversion, we use the data discretization scheme of SAX [47] to assign wavelet coef-ficients into k equiprobable regions.Each continuous coefficient value that falls into a region maps to a unique symbol ( ) ( ) In addition, since KLD is not symmetric: For two probability distributions ( , 2 There is a tradeoff involved in choosing the optimal size κ of alphabet A. That is, a large value κ keeps more information about the distribution of the continuous data, but it might generate too many false alarms.In contrast, a small value of κ keeps less information about the distribution, but it might be insensitive to the detection of slow or stealthy attacks.In Section 6.2, we will examine more closely this tradeoff and illustrate how we can determine the optimal value of κ. Given the number of scale L and the number of aggregated segment W in a 1 2 pair , O O , we define the wavelet-based distance (WBD) between O 1 and O 2 as shown in Equation (7).WBD summarizes the divergence of inter-flow, inter-flow, and KLD between two network flows.A large value of WBD indicates significant difference between two flows, while a small value of WBD implies the opposite.
We further extend the above definition to the normalized WBD as follows: given a number of outbound flows where the flow

System Design
In this section, we first present the architecture of our time channel detection system, which leverages the vacant VMs and the SND virtual network.Then, we detail the implementation of our TC detection system, as well as the evaluation tests using our proposed metric.

System Architecture
The architecture of OBSERVER is illustrated in Figure 5, which comprises four major components.Initially, the heuristic engine (HE) is triggered by a network packet sniffer (e.g.Wireshark [48]) or a network intrusion detection system (NIDS), when an internal VM (say, VM1) tries to establish a connection with an unknown server outside and the existing IDS cannot detect anything malicious from outbound HE activates the OBSERVER in the TC detection mode.Specifically, the HE sends instructions to the SDN controller (SC) and VM controller (VC), respectively (Step 1).As a consequence, the SC issues the command that enables the virtual bridge (Step 2a).Meanwhile, the VC issues the command that starts VMs from the VM repository (Step 2b).In this example, VM2 and VM3 are the VMs started from the repository.SC also enables vSwithch2 and vSwithch3, which allow the traffic to be replicated by vSwithch1 and forwarded to VM2 and VM2, respectively (Step 3).Once the outbound flows are captured by vSwitch i (i = 1, 2, and 3) (Step 4), the timing information of each flow is forwarded to the TC Detector (TD), which applies our TC metric to identify anomaly outbound flows (Step 5).If no TC is detected from the VM under inspection, SC will disable the virtual bridge and VC will reclaim the VMs in use to the repository.As a key component, TD uses timing distance metric to measure the distance between the outbound flows of VM1 and VM2 (denote as dist (VM1, VM2)) and that of VM2 and VM3 (denote as dist (VM2, VM3)), respectively.TC uses dist (VM1, VM2) as the baseline to tell if the outbound traffic of VM1 is anomaly as follows: First, the targeted false positive rate is set at σ.To achieve this false positive rate, the cutoff scores-the scores that decide whether a sample is legitimate or covert, are set at the ( ) percentile (high scores or low scores for different tests) of legitimate traffic scores.Then, samples with scores worse than the cutoff are identified as TC, while sample with scores better than the cutoff are identified as legitimate.The false positive rate is the proportion of the legitimate traffic samples that are wrongly identified as TC, while the true positive rate is the proportion of TC samples that are correctly identified as covert.

System Implementation and Evaluation Tests
We have implemented our TC detection system in C on the top of KVM [49].The traffic filter and traffic distributor were implemented as a transparent bridge by Open vSwitch [50].To emulate the malicious program that exfiltrates inside information, we have modified the source code of vsftpd [51] running on RealTime Application Interface for Linux (RTAI) [52].The modified vsftpd includes the TC encoder, which generates timing delay before sending outbound packets to the client.The TC encoder was written in C and inline assembly that invokes the Read Timestamp Counter (RDTSC) instructions of CPU [53].We choose RDTSC instruction because it has excellent resolution and requires low overhead to generate timing delay [7].

Evaluation
In this section, we first justify the assumption of the baseline selection by analyzing the similarity between legitimate outbound flows (Section 6.1).Then, we present the selection of the optimal values of parameters which might affect the effectiveness of detection (Section 6.2).After that, we compare our approach with existing approaches w.r.t. the effectiveness of detecting various TCs (Section 6.3).

Similarity between Virtual Machines for Legitimate Traffic
The objective of the first set of experiments is to justify our assumption that similar timing patterns exist between legitimate outbound flows as the baseline of TC detection.Given the same inbound traffic, the IPDs of outbound flows are collected from VM i (i = 1, 2, and 3). the IPDs of two legitimate outbound flows with the size 300, 1000, and 33,000.The highly linear nature of all plots strongly indicates that the outbound flows come from the same distribution.Table 3 shows the test results from the datasets.For the WT-Test and the KS-Test at the 5% significance level, the value h = 0 indicates the acceptance of the null hypothesis; i.e., that two flows come from the same distribution, while h = 1 indicates the opposite.The p-values of the WT-Test and the KS-Test indicates whether two samples differ significantly, say rejecting the null hypothesis if the p-values are "small".Both the WT-Test and the KS-Test accept the null hypothesis when the sample size is small (size = 100, 1000), but reject the hypothesis when the sample size is large (size = 33,000).However, our metrics show that all legitimate flows have constant values (less than 32.5), which indicating similar statistics between legitimate flows.In addition, this evaluation also indicates that more noise can be observed from the outbound IPDs if no long-term time synchronization mechanism is configured in VMs.

The Choice of κ and ω
In our detection approach, the Kullback-Leibler divergence that measures the difference between two discretized wavelet vectors one must choose κ, the size of the alphabet  , which is critical to the effectiveness of detection.The tradeoff in choosing κ (see Section 4.3) are as follows: a smaller κ values keep less information about the individual distributionsleading to higher false negative rate.In contrast, a larger κ value increases the detection sensitivity, thereby leads to higher false positive rate.To determine the optimal value of κ in our system, we empirically test 2 α = through 10 for both legitimate and different versions of Jitterbug flows.Figure 7(a) il- lustrates the WBD values of various flows, which indicates that legitimate and malicious flows can be differentiated when the value of κ is greater than 10.Weget a similar result in other TCs by running similar experiments.Thus, we choose κ = 10 to retain the ability of measuring the deviation of malicious traffic.
The second parameter to be selected is the detection window size ω, which is the number of IPDs to be compared between the outbound flows.Its selection also faces trade-off: the larger ω causes longer wavelet decomposition time and slower response, whereas the smaller ω cause the opposite and could make the detector over-sensitive and increase false positive rate.Figure 7(b) shows the WBD value for window size ω = 100 through 1000 for both legitimate (e.g.normal) and stealthier Jitterbug flows (e.g.jitterbug_5, jitterbug_10, jit-terbug_20, jitterbug_30).Our empirical results show that the WBD values of legitimate and malicious flows can be differentiated when ω is greater than 800.We obtain similar results for other TCs.Therefore, we choose ω = 1000 for all the following evaluations.

Timing Channels under Evaluation
We evaluate our WBD metric with a wide range of TCs in Table 4, which covers both active TC and passive TCs.The empirical cumulative distribution (ECD) of IPDs for different TCs is illustrated in Figure 8. Below, we present the evaluation of detection for each of them in detail.

IP Timing Channel (IPTC) Detection
Our first sets of experiments are to detect the IPTC [11], whose working mechanism has been presented in Table 4. Specifically, IPTC encodes a 1-bit by transmitting a packet during a timing interval w, and encodes 0-bit by not transmitting a packet during w.In our experiment, the TC encoder reads the "/etc/passwd" file and encodes its binary into IPTC.We also developed four versions of IPTC, namely IPTC i (i = 1, 2, 3, and 4), whose design is the following: giving the observation that the average IPD of traffic is 0.147s, we design the first three schemes by choosing the timing interval w of 0.1s (IPTC1), 0.12s (IPTC2), 0.14s (IPTC3).The fourth scheme rotates among the above after each 100 packets (IPTC4) to avoid creating a regular pattern of IPDs.In this experiment, we run the tests 100 times in a duration of 50 seconds.In each flow, we collect around 400 packets.
In order to run the regularity test, we divide the IPDs sequence into 20 segments.Figure 8(a) illustrates that even simple statistical tests can detect this attack: the empirical cumulative distribution (ECD) of legitimate flows and IPTC flows are different, whereas the ECD of two legitimate flows are similar.This situation matches with the findings in the literature [4] that IPTC is the easiest TC to be detected.Table 5 shows more detailed results of all tested flows.Although all the tests can detect all IPTCs, our WBD measurement shows the best results: the WBD of legitimate flows is very small (0.6295), which is in stark contrast to the WBDs of IPTC flows, all of which are 160,000 or higher.Although MLAN and CCE can differentiate all IPTCs from legitimate flow, they can hardly differentiate different versions of IPTCs.For instance, the CCE for different IPTCs are all close to the same value (0.609), and therefore indistinguishable.As an interesting observation, since IPTC i (i = 1, 2, 3, and 4) send packets at the same frequency, the regularities of them are identical.Thus, it is impossible to differentiate different IPTCs by using the regularity test.The WBDs values of IPTC i (i = 1, 2, 3, and 4), however, can be used to easily detect and differentiate different IPTC i .

Time-Replay Timing Channels (TRTC) Detection
Our second set of experiments is to detect TRTC [9].TRTC replays a set of legitimate inter-packet delays to mimic the legitimate traffic.Adversary first collects a sample of legitimate traffic, calculates their IPDs and put s them into a bin Bin.Then, Bin is partitioned into two bins of equal size: Bin 0 and Bin 1 .TRTC transmits the bit 0 by randomly replaying an IPD from Bin 0 and transmits the bit 1 by randomly replaying an IPD from Bin 1 .Since the IPDs in Bin i (i = 0 and 1) are made up of legitimate traffic, the distribution of TRTC traffic is equal to that of the legitimate one.In this experiment, along with the original TRTC, four versions of TRTCs (TRTC1, TRTC2, TRTC3, TRTC4) are designed.In particular, one bogus packet is injected into the original flow after every n packets (n = 5, 10, 15, and 20), in which larger value of  indicates a slower attack.Table 6 shows more detailed results of all the flows tested, which are similar to those derived for IPTC.Since TRTCs replays the legitimate IPDs, its IPDs demonstrate the same distribution as legitimate flows.Both WT-Test and KS-Test fail to detect TRTCs and thus accept the null hypothesis-the IPDs of TRTC and the legitimate traffic follow the same distribution.Although some tricky cutoff points can be chosen to detect some TRTCs through MLAN and CCE tests, the measurements of WBD and N_WBD of TRTC i are distinct enough such that they can easily differentiate various TRTC i .

Back-Track Watermark (BTW) Detection
Back-track Watermark (BTW) is a passive TC specifically designed to track back the communication between a bot and its bot-master [15].Specifically, to encode an i-bit sequence 0 1 , , i S s s − = , the packets pairs: , are chosen randomly, such that i i r e ≤ , with ri P called a reference packet and ri P called an encoding packet.Let e l and r l be the IPDs of the covert bit encoding and reference packets, respectively.A covert bit ( ) is encoded into the packet pair , ri ei P P .Specifically, the covert bit encoding function is defined in Equation (8).To adjust the intervals between ei P , a pseudo-random number generator (PRNG) and seed t s are used to generate the random time interval ei t between ei P and We extend the initial TCs design scheme to generate slow attacks.In particular, we use ( ) 2 1 ai a ≥ packets to encode bit sequence S, where the parameter a can either be a constant or a variable generated by a PRNG.ri P and ei P were chosen from 2a packets.We call a the amplifier, which indicates how slow the information can be transmitted.The larger the value of a, the slower an attack can proceed.Four version of BTW have been developed by using different amplifiers.7, in which the mean and standard deviations of legitimate and TC IPDs are quite close, particularly for large amplifiers.The WT-Test and KS-Test can detect aggressive BTW (BTW1) but all fail to detect stealthy BTW (BTW4).Table 8 shows the comparison of true and false positive rate of different detection approaches, in which MLAN, CCE and WBD can reach 100% true positive rate.However, MLAN also suffers 65% false positive rate.Our evaluation also shows a tie between CCE and WBD as they both reach 100% true positive with zero false positive rate.However, a closer look at the quantities of CCE and WBD reveals that WBD gives better results than CCE because the WBD metrics are capable of differentiate the BTW i , from the more aggressive to the stealthier ones.

JitterBug Detection
JitterBug [6] is a passive TC that manipulates the existing network traffic.In particular, it transmits a 1 bit by increasing an IPD to a value modulo w millisecond and transmits a 0 bit by increasing an IPD to a value modulo For small values of w, the distribution of JitterBug traffic is very close to that of the original legitimate traffic.However, a too small w can also cause the TC flow indistinguishable from the legitimate flows with noises.In this experiment, we choose 100 milliseconds as the value of w.In addition, four versions of Jit-terBug, which use different amplifiers are also developed: JitterBug1 (a = 5), JitterBug2 (a = 10), JitterBug3 (a = 20), and JitterBug4 (a = 30).This design ensures that legitimate and TC flow have almost the same duration and comprise almost the same number of packets in flows.The ECD of Jitterbug is illustrated in Figure 8(d).
Table 9 shows the scores of all tests, in which the means and standard deviations of legitimate flow (mean = 0.1472, stdev = 0.0406) and stealthy JitterBug flow, say JitterBug4 (mean = 0.1497, stdev = 0.0432), are very close.It is impractical to use regularity, WT-Test and KS-Test to detect JitterBug flows as they can only detect the regular JitterBug flows (JitterBug1) and all fail to detect the stealthy ones.Similarly, it is difficult to differentiate JitterBugi (i = 1, 2, 3, and 4) as the testing scores are the in a range between 1.18 and 1.20.In comparison, the testing scores of WBD can clearly distinguish JitterBugs, even the stealthiest one.The evaluation of true/false positive rates also demonstrates the advantage of our metric.Table 10 tabulates the true/false positive  rates of different tests.Comparing to all other metrics, our metric is the only one that can achieve zero false positive rate and 100% true positive rate at the same time.

Conclusions and Future Directions
In this paper, we present a novel framework that addresses the open challenges of timing channel detection in an SDN cloud environment.Our framework dynamically configures SDN to enable/disable differential analysis against outbound network flows from different virtual machines (VMs).Coupled with a new metric, our framework is able to detect most existing and stealthy TCs without any traffic for modeling, even with the presence of noise and imprecise timekeeping mechanism.We implemented our framework as a prototype system, OBSERVER, which could be dynamically deployed in an SDN environment.Empirical evaluation shows that our approach can efficiently detect TCs with a higher detection rate, lower latency, and negligible performance overhead.This work was an initial exploration into the detection of network timing channels and there were many avenues for future work.Future work will be aimed at addressing some of the shortcomings of the current design and implementation and extending the ideas explored in this paper.First, our current TC detection approach only focuses on detecting the active and passive TCs that manipulate the inter-packet delays (IPDs).However, in a multi-tenant virtualization environment, hostile tenants can leverage various covert channels to exfiltrate sensitive information from the victim, who shares the same physical machine [28] [29].Covert channels were also observed in Android platform [56].As a future direction, we plan to extend our approach to detect various TCs on different platforms, such as virtual and mobile platforms.Second, our wavelet-based distance metric can still be improved in many ways.In the short term, we plan to refine the time discretization algorithm using dimensionality reduction via PAA representation [57] and discretization based on different statistics [54].In addition, we also plan to use better virtual clock synchronization techniques among VMs from the layer of Virtual Machine Manager (VMM) in order to increase the detection accuracy.Finally, although we have justified the usage of the vacant VMs in an SDN virtual network environment, this cost is still considered as inevitable overhead.An extreme case is that the adversary can trigger parallel outbound flows, among which only a negligible percent of them contain TC.This exploit could potentially impact the accessibility of the cloud by exhausting its resource.As another future direction, we will investigate other system architecture and detection metrics designed to be cost-efficient and robust in the face of attempts to use as little resources as possible.For example, instead of using the heavyweight and high-cost vacant VMs, we will consider operating-system-level virtualization, or namely container [58].The benefit of using container is that it imposes little to no overhead to the host machine.Moreover, the container can directly access the CPU clock on the host machine, which reduces the effect of time drifting in a virtual machine.As a summary, this work is an initial exploration into the detection of network timing channels and there are many avenues for future work.In the short term we will refine the detection metric by applying new time discretization algorithms.In the longer term, we will investigate the efficient system architecture and cost-efficient and robust detection metrics, which are capable of detecting various TCs in different platforms.

Figure 1 .
Figure 1.The comparison of ECD between a legitimate flow and a JitterBug flow.

Figure 2 Figure 2 .
Figure 2. The effect of time drifting in virtual machines.(a) The similarity of IPDs between legitimate flow and TC flow (group size = 10); (b) A closer look at time drifting between two identical VMs without clock synchronization.

O
− are aggregated as an aggregated one in O 2 .
45][46].From an information theory's perspective, KLD measures the expected number of extra bits required to code samples from defined as follows:

Figure 3 .
Figure 3.The inbound and outbound flows.

Figure 4 .
Figure 4.The original IPDs and its wavelet coefficients at different scales.
j l based on S ι .The effectiveness of converting from ( ) , , V i j l to S ι depends on a mapping function F that maps to m α holds as follows: Therefore, we defines the KLD for the j th segment at scale l called the baseline flow pair.

Figure 7 .
Figure 7.The WBD plots of different network flows for different κ and ω.(a) WBD plots of different network flows for different κ; (b) WBD plots of different network flows for different ω.

Figure 8 (
b) shows that the ECD of legitimate IPDs and four versions of TRTC IPDs are almost identical due to the encoding mechanism of TRTC.

Figure 8 (
Figure8(c) illustrates the ECD of two legitimate flows and four TC flows, which shows that both the legitimate and TC traffic follow the same distribution.More detailed results of all tested flows can be found in Table7, in which the mean and standard deviations of legitimate and TC IPDs are quite close, particularly for large amplifiers.The WT-Test and KS-Test can detect aggressive BTW (BTW1) but all fail to detect stealthy BTW (BTW4).Table8shows the comparison of true and false positive rate of different detection approaches, in which MLAN, CCE and WBD can reach 100% true positive rate.However, MLAN also suffers 65% false positive rate.Our evaluation also shows a tie between CCE and WBD as they both reach 100% true positive with zero false positive rate.However, a closer look at the quantities of CCE and WBD reveals that WBD gives better results than CCE because the WBD metrics are capable of differentiate the BTW i , from the more aggressive to the stealthier ones.

Table 1 .
The means and standard deviations of two flows (in second).
In response to I, virtual machine VM1 generates O 1 that contains M packets (i =1, 2) were generated by VM i in response to I, we can segment the packets in I and O i based on their request/response relationship.Specifically, for the j th inbound segment to represent the timestamp of the k th packet in the j th segment of O i .Since K > M and K > N, we can further aggregate

Table 2
illustrates the equiprobable regions as defined under

Table 3 .
The statistic values of IPDs of legitimate flows of different size.

Table 4 .
Timing channels used in evaluation.

Table 5 .
The test scores of legitimate and IPTC IPDs.

Table 6 .
The test scores of legitimate and TRTC IPDs.

Table 7 .
The test scores of legitimate and BTW IPDs.

Table 8 .
True positives (TPs) and false positives (FPs) of different detection approaches for BTW detection.

Table 9 .
The test scores of legitimate and JitterBug IPDs.

Table 10 .
True positives (TPs) and false positives (FPs) of different detection approaches for JitterBug detection.