Cognitive Congestion Control for Data Portals with Variable Link Capacity

,


Introduction
A data portal provides information from diverse sources in a unified way.It enables instant, reliable and secure exchange of information over the web; in particular, a data portal focuses on providing centralized, robust access to specific data and supported manipulations.The concept of a portal functions to offer a single web page that aggregates content from various servers.
There are different types of data portals, for instance, academic portals, including those for scientific data; commercial portals; and enterprise portals.A data portal can be considered as an application-based network that consists of databases, different servers, web-based application software, communication links, and computing clusters.
With regard to a data portal, congestion can happen when a link or node carries so much data that a loss of quality of service for the portal results.As an early effort to control the network congestion, the Jacobson's algorithm [1] was embedded into the Transmission Control Protocol (TCP) [2].Although this protocol controls endto-end congestion conveniently, it also deteriorates network performance due to unstable throughput, increased queuing delay, and restricted fairness.Furthermore, longer delays will lead to weak link utilization, significant packet losses, and poor adaptation to changing link loads.
Conventional congestion control methods often cannot achieve both fairness and appropriate bandwidth utilization due to packet loss.To deal with the problem, various TCP parameters have been utilized for the estimation of the available link capacity and the Round-Trip Time (RTT) in order to predict congestion [3][4][5][6][7].
When a delay-bandwidth product grows, the TCPbased networks exhibit an oscillatory behavior under some congestion-control algorithms.Reference [8] explains that when the delay or capacity increases, Random Early Marking (REM) [9], Random Early Discard (RED) [10], proportional integral controller [11], and virtual queue [12] show oscillatory behavior.Whereas the bandwidth-delay product relating to a flow during high bandwidth links could contain many packets, TCP could waste a lot of RTTs ramping until full utilization, following a congestion burst.
The main obstacle in TCP is related to its reliance on scarce events that provide poor resolution information.
To improve adaptation to network conditions, achieve high utilization, attain stable throughput, and decrease standing queues in the network, some approaches have been proposed in the literature [13][14][15][16][17][18][19][20].Explicit Congestion Control (XCC), one of the famous congestion control approaches, is able to inform sources concerned with the network status and control the bit rate in network.
The XCC uses a header to carry the throughput information and Round-Trip Time (RTT) of the flow to which the packet belongs.When the throughput is used for the adjustment of bandwidth distribution, the RTT enables sources to control the speed of adaptation to network conditions.In XCC, routers play an important role in informing sources concerned with the network status and in helping sources to control their bit rate by accurate feedback.In fact, to determine the feedback for sources, a router should calculate the current spare bandwidth for outgoing links and compute the link capacity.
Some congestion control methods need explicit and precise feedback.As congestion is not a binary variable, congestion signaling should provide the congestion degree.By means of precise congestion signaling, it is possible to determine when the network tells the sender the congestion state and how to react to it.In fact, the senders can decrease their sending windows quickly when the bottleneck is extremely congested.However, these methods-based on a control loop with feedback delay-become unstable for long feedback delay.To deal with this effect, the system should slow down while the feedback delay increases.In other words, when delay increases, the sources should change their transmission rates more slowly [8,[21][22][23].
As one of crucial issues related to network congestion, robustness of the method should be independent of unknown and quickly changing parameters (e.g., the number of flows).Also, for such methods as XCC, convenient bandwidth sharing is difficult when the information inquiries and capacity of links are variable.In other words, the unpredictability of the network creates a problem for XCC.This study focuses on a cognitive method to control congestion; it also can perform well when the link capacity and information inquiries are unknown or variable.

Cognitive Concept
A cognitive system is a complex system that has the ability for emergent behavior [24].It processes data over the course of time by performing the following steps: 1) perceive defined situations; 2) learn from defined situations and adapt to their statistical variations; 3) build a predictive model on prescribed properties; and 4) control the situations and do all of these procedures in real time for the purpose of executing prescribed tasks.
To optimally adapt the network parameters and to provide efficient communication services using a cognitive approach, learning the relationships between parameters of network is crucial.In learning phase, it is possible to utilize the Bayesian Network (BN) model.A BN is a probabilistic graphical model that represents conditional independence relations between random variables by means of a Directed Acyclic Graph (DAG) [25].The DAG is constructed with a set of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at vertex i and follow a sequence of edges that eventually loops back to i [26][27][28].
The BN model can be used to provide a representation of the dependence relationships among network parameters and adjust cognitive parameters to improve the network's efficiency.It is utilized to deal with congestion, one of the challenging tasks in the TCP; there is no efficient mechanism to determine when congestion occurs in the network.

Variable Link Capacity
As mentioned earlier, to efficiently control the network congestion, and preserve stable throughput, low queuing delay, the critical parameters in network can be defined and adjusted based on pre-defined criteria and statistical variations of the network.
The available bandwidth, available , which is distributed among different flows during a certain time period T, is defined as follows: where coefficients 1 and 2 are constant, x(t) is the bandwidth utilized for the last period T, C is the estimated capacity of the data transmission link, and Q(t) is the minimum queue length that happened during the last T seconds.
The parameter T can be written as , in which 0 is the system base delay, or the delay excluding queuing delay; and is the actual capacity of the data transmission link.
The capacity C is a function of various factors, such as the data rate of every link, the number of active links, failed transmissions, the number of collisions, and handshake procedures.The estimation error of link capacity is defined as real   

C
. The given error should be compensated up to a certain limit.To define the limit, the parameter C in (1) is replaced by real   .When the capacity of the data transmission link is fully utilized, i.e.,   real x t C  , it is expected that the available bandwidth is zero or close to zero, due to error.Therefore, the limit of the estimation error is defined as the following: The value of available depends on the available bandwidth and the standing queue in the router.In fact, if the link capacity changes, the F available can be adjusted to dis-The proposed method computes the F available with no knowledge of exact channel capacity.It also can adjust the F available according to bandwidth variations.

Adjustment of Available Bandwidth
Typically, a router controls each of its output queues; therefore, available bandwidth is computed for each of them.With the proposed method, in order to compute available bandwidth, it is not required for the router to be configured with certain medium capacity.In addition, the proposed method can adapt to changing bandwidth conditions over time.
First, the effect of queue speed on available bandwidth F available is considered.The queue speed can be defined as the difference between the capacity of the transmission channel and utilized bandwidth during the time period T. Equation ( 3) is written as follows: where Q T  is the queue speed.Due to queue variations, the queue length should be adjusted for F available , so parameter α is defined.It is possible to conveniently tune F available using the parameter α during extreme queue variations.The parameter α is adjusted by the cognitive algorithm.In fact, the parameter α controls the target queue length in which the network stabilizes.

Cognitive Congestion Control
The schematic of the cognitive congestion control is illustrated in Figure 1.Each network parameter is periodically sampled and collected in the input matrix.The cognitive process is decomposed into four steps: 1) observation, 2) learning, 3) decision, and 4) action.
During the observation step, required information from network is collected.Then, the cognitive algorithm learns the relations between the parameters and their conditional independences as well as the effect of controllable parameters on observable parameters.
During the decision step, the values to be assigned to controllable parameters are calculated to meet pre-defined requirements.In other words, the values of the network parameters of interest are predicted based on the observations.This prediction is done by inference, using the Bayesian network.
In the action step, the controllable parameters are tuned, and the appropriate actions are taken in the network.

Observation
During the observation step, seven network parameters are examined.These parameters are: 1) The Round Trip Time (RTT), that is, time period for which a signal to be sent plus the time period for which an acknowledgment of that signal to be received; 2) The queue length; 3) The queue speed; 4) The throughput, that is, total amount of successful delivered data over a link; 5) The contention window size; 6) The congestion window size, that is, the total amount of unacknowledged data; 7) The congestion window status.The congestion window status is considered as 0 if the congestion window size at time t becomes 25% less than the congestion window size at time t − 1; otherwise the status is 1.The status equals to zero is of interest, as the congestion is being decreased.
Here, observed network parameters are considered as random variables (x 1 , •••, x 7 ).It is assumed the given variables have unknown dependence relations.The independent samples from every variable have been gathered into the input matrix (size of n × 7).The construction of input matrix is performed during the observation step.

Learning
The learning step is a key step in the cognitive algorithm.During this step, the BN is built to provide a structure representing conditional independence relations between parameters of interest in a DAG.To form the BN and demonstrate the relations in a DAG, learning from the qualitative relations between the variables and their conditional independences is considered.
A node in the DAG represents a random variable, while an arrow that joins two nodes represents a direct probabilistic relation between the two corresponding variables.For i x , if there is a direct arrow from j to i, node j will be a parent of node i. ( i describes the set of parents of node i).A complete DAG with all nodes connected with each other directly can represent all possible probabilistic relations among the nodes.

     
During the learning phase, based on the input matrix (Im), the dependency is exploited among the variables represented as nodes in a DAG.To build the DAG representing the probabilistic relation between the variables, the selection of DAGs and the selection of parameters are utilized.

Selection of DAGs
For the selection of DAGs, the scoring approach and the constraint approach can be utilized [29,30].
In the constraint approach, a set of conditional independence statements is defined by a priori knowledge.Then, the given set of statements is utilized to build the DAG, following the rules of d-separation [29].
The scoring approach generally is utilized when a set of given conditional independence statements is not available [31,32].The scoring approach is capable of inferring a sub-optimal DAG from a sufficiently large data set (i.e., Im).The scoring approach consists of two phases: 1) Searching to select the DAGs to be scored within the set of all possible DAGs and 2) scoring each DAG according to how accurately it defines the probabilistic relations between the variables based on the Im.

1) Searching process
The searching process to select the DAGs (i.e., the first phase of the scoring approach) is required because it is not computationally efficient to score all the possible DAGs, since the scoring procedure generally takes a great deal of time.For instance, to find the DAG with the highest score for a set of m variables, the following formula is expressed [33]: Most of searching processes in scoring approaches are based on heuristics that find local maxima almost appropriately.However, the heuristics do not generally guarantee that global maxima is obtained [30].
There are two classical searching procedures in literature [34]: 1) Hill Climbing and 2) Markov Chain Monte Carlo.
Hill Climbing is an iterative algorithm by which an arbitrary solution is initially defined for a problem.Then, the hill climbing algorithm searches a better solution by incrementally changing a single element of the solution.If the change generates a better solution, an incremental change is made to the new solution; this is repeated until no further improvements can be reached [34,35].
Markov Chain Monte Carlo (MCMC) is a category of algorithms for sampling from probability distributions based on constructing a Markov chain that has the dis-tribution of interest as its equilibrium distribution.After specific procedure, the state of the chain is utilized as a sample of the distribution of interest [36,37].
The searching process results in some DAGs.

2) Scoring
The Bayesian information criterion is selected for scoring, and is based on the maximum likelihood criterion.The Bayesian information criterion is expressed as follows [32]: where Im is the dataset (i.e.input matrix), A is the DAG to be scored, A  is the maximum likelihood estimation of the parameters of A, and n is the number of observations for every variable in the dataset.When all random variables are multinomial, the Bayesian information criterion is formulated as follows [30][31][32][33]38]: where i is a finite set of outcomes for every variable i x ; i is the number of different combinations of outcomes for the parents of i C x ; ijk is the number of cases in the input matrix in which the variable i N x took its kth value (k = 1, 2, •••, O i ), and its parent was instantiated as its jth value (j = 1, 2, •••, C i ); and ij is the total observations related to variable N i x in the input matrix (Im) with parent configuration j (i.e., ).
Therefore, based on Equation ( 6), the scoring approach is computationally tractable.More details about Bayesian information criterion are presented in [38].Now, the DAG with highest score can be selected.

Selection of Parameters
During the selection of parameters, the best set of the controllable parameters are chosen and estimated, based on the observed parameters and their independence relations.
Based on the Bayesian network definition, every variable is directly calculated by its parents; thus, the estimation of the parameters for every variable x i is performed according to the set of its parents in the DAG selected during structure learning.The Maximum Likelihood Estimation (MLE) technique is used to build a predictive model and to estimate the appropriate set of parameters describing the conditional dependencies among the variables.The MLE technique is expressed as follows: For x  , the parents of node i are in the configuration of type j, and the variable i x takes its kth value (i.e.x given the evidence j (i.e., parents of node i in the configuration of type j).Therefore, Equation ( 7) can be re-written as follows [30]:

Decision
The Bayesian network is completed after selection of DAGs and parameters in the learning step.The completed BN provides the probabilistic relations among selected parameters from the selected DAG.In this step, the future values of the queue length and queue speed-that is, the unobserved parameters-are predicted based on selected observed parameters.The estimated value of unobserved parameter i x is defined as the expectation of the given parameter, using probability function represented in Equation (8).Therefore, the expected value of   ˆt i i x at time t, x , is calculated as follows: where x is the actual value of the unobserved parameter i x at time t, and evidence is the set of selected observed parameters.

Action
To calculate α in Equation ( 3), the predicted values of the unobserved parameters (i.e., queue length and queue speed) are considered.In fact, the fluctuation of predicted values for the queue length and queue speed are utilized to set the parameter α; then, parameter α adjusts the available bandwidth, F available .
As mentioned earlier, α represents the target queue length in which the network stabilizes.When there is no queue constructed (underutilization), α explains how much bandwidth is distributed in every control interval.During full utilization or overutilization, α will control how much queuing delay is introduced.
During the time of underutilization, the bandwidth is maximally distributed; if a link is saturated, the queuing delay is significantly decreased.Generally, α is high during underutilization, and is low during full utilization.

Result
The base scenario used in the simulation includes a dumbbell network topology, which provides a number of nodes connected to a single router.The router is connected to another router over a serial link.A group of nodes are connected to that router, creating the dumbbell topology.

25
The network traffic consists of flows between the client and server nodes in both directions.It is assumed that the flows traversing the network from server nodes to client nodes are downloads, while flows in the opposite direction are considered uploads.
The simulations were performed using the ns-3 network simulator [39].During the simulation, parameters of interest were sampled for each flow at certain sampling periods (i.e., every 30 ms and 60 ms).The queue length was set to 50 packets while the bottlenecks happened.The size of the data packets was 1200 bytes, and the size of acknowledgment packets was 50 bytes.

Variable Capacity
In this part of the procedure, the response of the proposed method to unexpected variations of link capacity was emphasized.During this simulation, the data rate changed.At first, the simulation was performed by the data rate of 56 Mbps.The variable capacity was simulated by changing the data rate, as shown in Figure 2. The data rate changed each 20 seconds, that is, 56 Mbps at t = 0, 21 Mbps at t = 20, 5 Mbps at t = 40, 21 Mbps at t = 60, and 56 Mbps at t = 80.
Due to sudden bandwidth reduction, there are queue spikes in the Figure 2. When queue length significantly increases, the parameter α increases.Thus, based on Equation (3), the difference between the queue length and α will not significantly change.Therefore, these spikes were compensated by the method, and available bandwidth was conveniently utilized.

Dynamics of Parameter α
To demonstrate the responsiveness of the proposed method to arrival and departure flows, a 40-sec simulation was performed, and the RTT was set to 60 ms.The average queue length as well as the parameter α throughout the time are illustrated in Figure 3.It was observed that the proposed method responded conveniently to the queue fluctuation.In fact, the dynamic of parameter α was assessed by observing the queue fluctuation.
When the queue is reduced, that is a sign of underutilization, and α is increased.During the increase of α, more bandwidth is distributed among servers to quickly provide full utilization.
To match the variation of the queue, the queue length was increased, while parameter α was decreased.Generally, there was a low latency caused by queue buildup.To prevent high queue spike, the maximum value of α should be less than the maximum value for queue length (i.e., channel capacity).Parameter α can tune the variation of bandwidth as it affects the queue.

Different Data Rates
In this part of the procedure, the response of the method was assessed while different data rates are used in network.It is considered that a part of the network has a data rate of 10 Mbps and the rest of the network has the data rate of 56 Mbps.In other words, new flows enter the network with data rate of 10 Mbps; other flows with data rate of 56 Mbps leave the network, or vice versa.It causes an oscillatory behavior for the bandwidth.The proposed method provides a stable queue under the given situation (Figure 4).

Efficiency
Now, the efficiency of the method is evaluated as network utilization.It is demonstrated that the increase of the bandwidth-delay product of network negatively affects the efficiency of the TCP; however, it has trivial influence on the efficiency of the proposed method.
To simulate a traffic pattern, two kinds of flows are considered: 1) flows with exponentially distributed duration, with certain minimum value (1 s) and mean value (10 s); and 2) other flows that are active during the simulation.
Each wired path between the end-system and router was configured with a specific latency; latencies of wired paths were between 20 ms and 120 ms.The growth of the bandwidth-delay product of network was simulated by increasing the path delay.
The result of simulation is shown in Figure 5.As shown in the figure, the efficiency scales change based on the increase of the link capacity.
It can be demonstrated that the TCP was not able to scale with the bandwidth-delay product of network because of its fixed dynamics.Based on the traffic pattern and the number of flows, the TCP was not able to fully utilize network resources for a specific bandwidth threshold.
Overall, the proposed method was able to maintain convenient utilization at all times.

Accuracy of Learning Process
To predict the status of congestion in future (i.e., t + k) at time t, the current value of all parameters of interest was considered.It is possible to predict when the congestion happens, and try to act before it affects the network.
To analyze the accuracy of the learning process for predicting congestion at time t, the value of Status (t + k)-that is, the presence or absence of congestion at time t + k, with k ≥ 1-was considered.
The performance of the learning process is assessed as a function of the size (i.e., number of samples) of the training set utilized to learn the relations between the desired parameters.The parameters are stored during the training, and the stored values become the input for the inference phase. In

 
is the predicted value of congestion status at time t + k.When congestion is present, the variable of Status is zero, otherwise it is one.This variable can be illustrated as the frequency of an error in the process of prediction.In Figures 6 and 7, two cases are separately assessed.In other words, the results are shown for and .
In Figure 6, under a different training set size from RTT and queue length, the average prediction error is not the same.As shown in the figure, if more information about RTT and queue length is available, the average prediction error decreases.Therefore, the number of collected samples from each parameter of interest should be more than 300.
In Figure 7, the average prediction error changes, based on the training set size for different values of the sampling period .If enough data (i.e., input samples) is available, the learning phase is conveniently preformed, and the prediction will be more accurate.

Conclusions
In this paper, a cognitive method is proposed to improve bandwidth sharing and deal with congestion in a data portal.For example, when the data portal is about climate change data, congestion control is more emphasized because the scientific climate data is voluminous; there is high traffic to/from the data portal by the scientific community, research groups, and general readers.In fact, this study was performed to improve congestion control in such data portals as the climate change portal.
Here, the data portal is considered as an applicationbased network.The proposed method was able to adjust the available bandwidth in the network when the link capacity and information inquiries were unknown or variable.In fact, it was possible to conveniently adjust available bandwidth, using the cognitive method, during extreme queue variations.
The variation of link capacity has an influence on the queue.In fact, α dynamically changes over the time, and helps the queue to have a smoother behavior while guaranteeing that the set is based on pre-defined operating conditions.
The learning phase is a key step in the cognitive method.During this step, the collected information in the observation phase is used by the Bayesian network model to build a probabilistic structure to predict variations of queue length.
The efficiency of proposed method was tested by a network simulator.Based on results, available bandwidth during extreme queue variations can be conveniently adjusted by the proposed method.Unlike TCP, in which the growth of the bandwidth-delay product of network affects negatively TCP's efficiency, the proposed method is able to maintain convenient utilization at all times.
number of possible DAGs.When m increases, the  DAG increases significantly, and the scoring procedure takes more time.Therefore, a searching process is required to choose a small, and possibly representative, subset of the space of all DAGs.
 of the posterior distribution of i

Figure 2 .Figure 3 .
Figure 2. The fluctuation of average queue length and parameter α throughout time during variable capacity.25

Figure 4 .
Figure 4. Fluctuation of average queue length and parameter α throughout time, while different data rates used in the network.

Figures 6 and 7 Figure 5 .Figure 6 .Figure 7 .
Figure 5. Efficiency versus capacity.Unlike the proposed method, the growth of the bandwidth-delay product of the network leads to TCP utilization decreases.In contrast, the proposed method is able to maintain utilization.0.5