P 2 P Traffic Optimization Based on Dynamic Network Information Aware

Peer-to-Peer (P2P) service may damage the interests of Internet Service Provider (ISP) because P2P traffic usually takes a lot of network link bandwidth and even overwhelms some network links. Aimed at the problem, mainstream solutions are usually optimizing P2P traffic through the interaction between applications and underlying network. However, current solutions still have two aspects of defects: one is that the interacted underlying network status information is immutable and can’t reflect the real-time dynamic changes because it is usually configured by ISP. The other is that some solutions may cause excessive traffic localization, which may greatly influence other services in the local network. In order to improve the above two defects and provide P2P users with better service experience, we propose an enhanced application layer traffic optimization scheme, in which more valuable network status information of underlying network is dynamically calculated and provided to P2P application. Extensive simulations demonstrate that our P2P traffic optimization scheme is superior to other solutions in terms of available bandwidth, resource transmission delay and user service experience.


Introduction
Peer-to-Peer (P2P) is a technology that users share resource without relay devices.P2P has many advantages, such as extensible, robust and low cost.However, a large number of P2P services swallow a lot of underlying network link bandwidth, which seriously affects the providing of other Internet services, and places a lot of traffic burden on Internet Service Provider (ISP).This problem is mainly caused by two reasons.One is that sharing resource between users using P2P technology produces a large amount of network traffic.The other is that the mismatch between P2P overlay network and underlying network results that a lot of P2P traffic across underlying backbone network.Therefore, it is necessary to optimize P2P traffic to reduce the burden of underlying network while providing P2P users with good service experience.
Existing P2P traffic optimization schemes include two major classes [1].One is overlay schemes [2] [3], which use reverse engineering to infer underlying network topology and status information in P2P overlay.The main idea is to assign coordinates for P2P nodes, and estimate neighbor nodes according to the coordinates.These schemes can localize P2P traffic based on the inferred underlying network information to reduce inter-domain P2P traffic.However, there are some problems that the detection methods are sometimes not accurate and reliable, and the acquired network information is incomplete.So the optimization is rough, the service performance and user service experience are limited.
The other kinds of P2P traffic optimization schemes are called collaborative schemes, including Oracle [4], P4P [5], and ALTO [6] [7], which obtain underlying network topology information and link status information through the interaction between P2P overlay and underlying network to provide optimization guidance for P2P traffic.These schemes represent the mainstream of P2P traffic optimization.However, there are still some defects in these schemes.Firstly, the information obtained through underlying network protocol, such as OSPF LSA Metric and BGP MED, are usually configured by ISP, which cannot truly reflect the dynamic changes of underlying network.Secondly, although selecting neighbor peers based on routing hops or IP addresses can significantly reduce inter-domain P2P traffic, it may lead to excessive P2P traffic localization and reduce the performance of other services in local network.Finally, some neighbor peers may be excessively selected, which will increase the load of some peers and decrease user service experience.
In order to optimize P2P traffic while ensuring user service experience, we propose a P2P traffic optimization scheme based on dynamic network information aware.In which P2P traffic is guided through the collaboration between P2P overlay and underlying network.The underlying network is divided into a number of areas according to network topology and routing information, which are called as provider-defined network identifier (PID).P2P traffic is optimized by selecting appropriate neighbor peers based on dynamic underlying network status information obtained from these PIDs.

Related Work
The mismatch between P2P overlay and underlying networks causes a lots of P2P traffic traversing the underlying backbone network, which aggravates the traffic burden of ISP.Some collaborative P2P traffic optimization schemes have been proposed to improve the situation.
Oracle [4] aims to achieve P2P traffic localization through providing peers matching suggestions for P2P applications according to topology information acquired from ISP.It made that ISP benefits from neighbor distribution based on Autonomous System (AS) hops and P2P users benefit from the choice of the last hop.However, Oracle didn't consider the routing policy information of ISP and the dynamic properties of network status.Meanwhile, it didn't consider how to realize Oracle service in an extensible and efficient way.P4P [5] is quite similar to Oracle, but more comprehensive and complex.It considered that unilaterally increasing the network efficiency of ISP or P2P application as the former solutions is less effective and there may be problem if solely pursuing P2P traffic localization.P4P hoped to solve the problem by acquiring more comprehensive network information, such as link cost, ISP routing policy, and so on.P4P architecture deployed a Tracker service in ISP network to collect network information and provides an interface to interact with P2P applications.Although P4P have been verified initially, it still faces some challenges, such as whether network provider and P2P venders have enough motivation to cooperate.What's more, P4P also didn't consider how to realize the scheme in an extensible and efficient way, and it is faced with the problem of privacy exposed.
ALTO [6] [7] was proposed and standardized by IETF ALTO Working Group, which deployed an ALTO server to convey network information for P2P application and guide applications to retrieve desired resources from several candidate providers.ALTO includes two parts.One is a discovery mechanism used by P2P application to find reliable resource, the other is a protocol used by P2P application to query such resources.ALTO not only considered the static information of underlying network, but also the underlying network status information during a relative long period.So it is able to provide better guidance for P2P traffic [8].Yet, ALTO just proposed a traffic optimization framework, it did not provide specific proposals on how to use ALTO in the context of dynamic network information-aware P2P traffic optimization.
With respect to existing solutions, we hope to propose an ALTO-enhanced P2P traffic optimization scheme.We mainly enhance the ALTO in three aspects: 1) propose a PID division method, which is based on real network topology and routing protocols to match large scale AS; 2) propose the methods to calculate link cost and construct network map and link cost map periodically; 3) put forward a neighbor peers selecting method to avoid excessively localizing P2P traffic while improving user service experience.

Problem Statement
P2P overlay network consists of peers, while its topology is usually inconsistent with the underlying network.We suppose the following scenario in Figure 1(a), P2P resource peers disperse in three different ASes.When peer A requests resource, peers B, E, G, H and K are randomly selected as its neighbor peers.The five selected peers are directly connected to A from the perspective of P2P overlay network.However, in underlying network, they may be connected via multi- Suppose A needs to download 10 Mb resource from each neighbor peer, each resource size is 50 Mb, the download rate is 200 Kb/s and the bandwidth of the link between every two ASes (such as M-N and S-T) is 1 Mb.In Figure 1(a), the hops between A and B, E, G, H, K, respectively are 2, 5, 6, 9 and 11.It is easy to get that the total traffic carried in the paths from these neighbors to A is 330 Kb.The traffic from E, G, H, K are carried by link M-N, so the bandwidth of link M-N is occupied by 800Kb, which is 80% of total link bandwidth.So topology mismatch not only causes large traffic in underlying network, but also occupies inter-domain bandwidth, which will seriously affect other Internet services.
If A selects neighbor peers locally, then C, D, E and F may be chosen as the neighbor peers of A, as shown in Figure 1(b).Then the total traffic generated in the paths from the neighbor peers to A is 3000 Kb, and the bandwidth used in link M-N is 400 Kb, which is only 40% of the total link bandwidth.Obviously, the total traffic and inter-domain traffic are greatly reduced.Therefore, the fundamental principle of P2P traffic optimization is realizing P2P traffic localization by selecting neighbor nodes based on underlying network topology.
Besides reducing inter-domain traffic over the underlying network, the other goal of P2P traffic optimization is to guarantee user service experience.Assuming T K represents the traffic generated by node i in session K, , K i j l represents the link between node i and j, which is mapped to N paths in the underlying net- represents the resources transmitted between node i and j, V is the neighbor nodes set.Then the traffic generated by node i in session K is: (1) The total traffic generated by P2P service is: (2) The goal of P2P traffic optimization is to get minimal possible T defined in (2) and minimal possible resource transmission time.

System Model
In order to optimize P2P traffic while ensuring good user service experience, we propose a P2P traffic optimization scheme based on dynamic network information awareness, which guides P2P traffic through the cooperation of P2P overlay and underlying network.In which, network status information is obtained to calculate link cost, then network map and link cost map is constructed and are used for selecting neighbor nodes.In order to provide the above capabilities, we set up a system model including network state information detecting entity, network information serving entity and neighbor nodes selecting entity in underlying network, and extend the resource information storing entity in P2P overlay (shown in Figure 2).

Network information serving entity obtains real-time network information
from ISP or network status information detecting entity, then constructs network map and cost map using the obtained network information.The neighbor nodes selecting entity is responsible for sorting and selecting neighbor nodes according to the network map and cost map received from network information serving entity and the candidate nodes list received from resource information storing entity.At last, neighbor nodes selecting entity sends the selected neighbor nodes to the requesting P2P nodes.Resource information storing entity acts as a tracker server in the hybrid P2P network or a responsible node in structured P2P network.Based on resource information storage and searching function, resource information storing entity no longer sends candidate nodes directly to the requesting P2P node, but to neighbor nodes selecting entity.Network status information detecting entities are deployed in the edge of PIDs and ASes for detecting real-time network status information, including routing hops, available bandwidth, delay between PIDs, and IP address summary information, etc.They will also advertise network status information to network information serving entity to construct network map and link cost map.

PID Division
In order to effectively optimize P2P traffic based on underlying network status information, some researchers [9] [10] divided P2P nodes into multiple areas based on clustering algorithms.However, with the change of the network information between nodes, the clusters may vary and the network information between clusters is not easy obtained.
Therefore, we propose a method to divide underlying network nodes into a number of areas, which are called as provider-defined network identifiers (PIDs), according to real underlying network topology and routing information,.In the underlying network, interior gateway protocol (IGP, such as OSPF and IS-IS) is running within each AS, and IGP divides AS into multiple areas which generally corresponds to certain administrative regions.So we take each IGP area within AS as a PID.One division example is shown in Figure 3. Compared with other schemes, our PID division scheme has some advantages: PIDs are naturally isolated with reasonable scale and stable structure, which can not only reduce inter-domain traffic but also have sufficient internal nodes to support traffic localization.Furthermore, the underlying network information can be easily detected using the interface between PID and backbone network.

Network Map Construction
Network map is constructed by network information serving entity according to the PID division scheme and the aggregating address information of PIDs obtained from network status information detecting entity, with which neighbor nodes selecting entity can learn the PIDs and ASes where each P2P node belongs to.Therefore, network map is consisted of the relationships among AS, PID, IP address and mask of each PID.The structure of network map is shown in Table 1.

Link Cost Map Construction
Link cost represents the priority that a P2P node is selected as a neighbor node for other nodes.The higher the node priority, the higher possibility is selected as a neighbor node.That is, link cost is the basis of neighbor node selection in P2P network.In this paper, link cost is calculated according to the real-time network status information.Therefore, we first need to obtain the appropriate network status information.
We consider that traffic optimization is related with the region where nodes locate, and traffic localization can be achieved by limiting routing hops and AS hops of resource transmission.
Assuming that represents the path between node i and j in P2P session K, then which can be consist of three parts: the path in the PID where node i locates, the path in the AS where node i locates, and the path in other AS where node i does not locate, as shown in (3).
(3) So the total traffic in (1) can be expressed as following, (4) In P2P network, the path between the resource requester node and its neighbor nodes locating on the outside PID must include PID internal links.So in order to reduce network traffic, we should reduce the number of links outside the PID where the resource requester node locates, i.e., reduce the value of , which can be achieved by selecting nearby neighbor nodes.The distance between P2P nodes is determined by routing hops.The more hops, the smaller probability that two nodes locate in the same PID.So we should select local nodes according to the routing hops between P2P nodes.On the other hand, AS hops reflects the AS number between two P2P nodes, which can be considered as a punitive factor to reduce the probability of selecting nodes locating in other ASes away from the requester node.So routing hops and AS hops are the key to achieve P2P traffic localization.
Assuming delay i,j and available_bandwidth i,j are respectively the total delay and the available bandwidth on path , then we can calculate link cost as following: (5) where, is an adjustment factor for path available bandwidth, which is used to adjust the sensitivity of path available bandwidth and selecting neighbor nodes with high available bandwidth.
is an adjustment factor for path delay, which can be increased to prevent selecting the nodes with longer delay.
is an punitive factor, which is used to limit selecting neighbor nodes crossing multiple ASes.The value of the above three factors can be determined according to the optimization strategy of ISP. and are respectively the number of hops and ASes that path contains.In order to preferentially select the nodes locating in the same PID with the requester node, the link cost between these nodes should be set to maximum, that is, Based on above analysis, the link cost map can be constructed according to link cost between PIDs, as shown in Table 2.
However, the calculation of link cost is periodical (e.g. 5 minutes), and session state between nodes usually varies frequently, so the link cost stored in the cost map during the calculation cycle may not exactly reflect the current network status.In order to solve this problem, we propose a dynamic adjustment method for link cost.When nodes in one PID is selected as neighbor nodes, then the link costs between the PID where the selected nodes locate and other PIDs will be decreased by P%, thereby reducing the chances those nodes are selected again in that PID.In contrast, when the nodes complete the resource uploading, the link costs between the PID where the selected nodes locate and other PIDs will be increased by Q% (Q < P).Because node selection is usually frequently executed, in order to reflect the effect of dynamic adjustment and prevent link cost being decreased too quick, we recommend .

Neighbor Node Selection
Neighbor node selection is the key for traffic optimization.Selecting neighbor nodes according to link cost map can partially achieve traffic optimization.However, it is also necessary to take into account the influence of node capacity on traffic optimization.There are two main reasons: the one is that, when multiple nodes request resource from the same node, the transmission rate of each session must be affected.The other is that, if the session from a node is not limited, then there may appear uneven session distribution, which will indirectly affect user experience.So, we assign an initial session number for each node, named as node_sessionNum, which is determined by real-time node status.
When a node is requested to transmit resource, the current session number of that node, named as available_sessionNum, will be automatically decreased by 1.
On the contrary, when a node session is released, available_sessionNum of that node will be increased by 1.In case of available_sessionNum ≤ 0, the node will not be selected as resource provider.
For node i in P2P network, N i represents all candidate nodes of node i, C i represents the optimized subset of N i , node_sessionNum i represents the maximum session number of node i, and available_sessionNum i represents the current available session number of node i.Assuming that ( ) , i j i P j N ∈ represents the priority that node i chooses node j to establish session, then node selection problem can be expressed as: (6) where, .To prevent resource isolation caused by selecting all neighbor nodes from the same region where the requester locates, we use the idea of Biased Neighbor Selection method [11] as reference, which selects certain number of nodes from different regions.We divide the nodes of underlying network into three different types of clusters: intra-PID, inter-PID (the same AS where requester locates) and inter-AS, which are respectively presented as , and , and select optimized nodes subset from each cluster, which are respectively presented as , and .Then we can get the following: where, .Finally, the final required optimized neighbor nodes list is obtained by merging the above three parts optimized nodes.

Simulation Environment
OMNeT++ [12] is used to verify the above P2P traffic optimization scheme.
INET based on OMNet++ is used to simulate TCP/IP protocol stack and Over-Sim is used to construct P2P overlay network in application layer.
In order to clearly observe the impact of P2P traffic on link bandwidth, the bandwidth between gateway routers and edge routers (i.e.PID export bandwidth) is set to 200 Mbps, the bandwidth between core routers and gateway routers is set to 500 Mbps, and the bandwidth between core routers (i.e.AS export bandwidth) is set to 1000 Mbps.PID hops, AS hops, path available bandwidth, path delay, and the aggregate IP address of PID and AS in PID export routers are monitored.

Simulation Method and Comparing Schemes
In order to effectively evaluate the performance of P2P traffic localization and user experience achieved in our scheme, the simulation makes statistical analysis on four parameters, including available bandwidth of PID export routers, available bandwidth of AS export routers, delay between two P2P nodes, and download time of certain resource.Through analyzing available bandwidth of PID export routers and AS export routers, we can evaluate whether the scheme can effectively localize P2P traffic.
Through analyzing transmission delay between two P2P nodes and download time, we can evaluate whether the scheme can guarantee user experience.The above five schemes is simulated for 50 times respectively, and the statistics average performance index are obtained.
In our simulation, NEWOP represents our proposed scheme.Four other schemes are selected to be compared, including UNOP, OP1 [13], OP2 and OP3 [14].UNOP represents traditional P2P application without traffic optimization, where neighbor nodes selection is random.OP1 achieves P2P traffic optimization by taking routing hops as the basis of node selection.OP2 and OP3 are based on ALTO protocol, which achieve both P2P traffic optimization and user experience by considering routing hops and upload bandwidth of node.OP2 prefers user experience while OP3 prefers traffic localization.

Performance Evaluations
1) Available Bandwidth of PID Export Router Figure 5 shows the available bandwidth of PID export router.At the initial stage of simulation, P2P nodes join the network through Chord protocol, which only exchange few Chord messages and routing messages.So little band-width is occupied and the available bandwidth of PID export router maintains about 200 Mbps.When P2P nodes start to request resources and exchange resource with each other, the available bandwidth is quickly reduced.
In Figure 5, there is 80% of available bandwidth and the jitter of available bandwidth is very small in NEWOP.One reason is that NEWOP generates little crossing PID traffic and achieves good traffic localization.The other reason is that NEWOP considers the available connections of P2P nodes and dynamically adjusts link cost based on biased neighbor selection.Similar to NEWOP, the traffic localization effect of OP1 is also perfect.However, OP1 selects nodes mainly based on routing hops, which could cause excessive traffic localization 2) Available Bandwidth of AS Export Router Figure 6 shows the average available bandwidth of AS export router.There is approximate 95% of available band-width in AS export router in the four P2P traffic optimization scheme.That proves that P2P nodes acquire little number of resources from other ASes, so these schemes are able to guarantee good AS-level traffic localization.In addition, NEWOP considers the number of available connections and dynamically adjusts link cost, its node selection inside AS will not change suddenly.Therefore, NEWOP also have advantage over other AS-level P2P traffic optimization.
3) Transmission Delay between Nodes Figure 7 gives the link delay for resource transmission between two selected P2P nodes.At the beginning of the simulation, there is no data transmission, resource transmission delay is mainly composed of forwarding delay of routers.When inter-node resource transfer begins, link congestion caused by large amounts of resource transmission becomes the main part of link delay.
As shown in Figure 7, the link delay in UNOP keeps around 80 ms and the jitter is large.However, the transmission delay keeps around 60 ms in OP1, OP2 and OP3, which can achieve traffic optimization.In contrast, the transmission delay keeps around 50 ms in NEWOP, which has obvious advantages over other  schemes.Why NEWOP has a lower resource transmission delay?There are two reasons.One reason is that P2P traffic localization can reduce the occupation of cross-domain link bandwidth, which reduces the router forwarding delay.The other reason is that NEWOP considers the path delay between nodes when calculating link cost, which reduces the probability that long delay nodes are selected.
4) Resource Downloading Time Resource downloading time is an important factor for user experience.Figure 8 shows the average resource down-loading time of different schemes.It shows that UNOP has lower downloading time compared with other P2P traffic optimization schemes.It is because that all P2P traffic optimization schemes must sort and select nodes before downloading resource, and in order to achieve traffic localization, partial traffic optimization schemes prefer to select local nodes rather than the outside nodes with higher access bandwidth and smaller path delay.Figure 8 shows that the resource download time in NEWOP is 7.9% more than UNOP, but respectively less 24%, 11.8% and 21.7 compared with OP1, OP2 and OP3.So NEWOP has greater advantage than other traffic optimization schemes.That is because NEWOP considers both path available bandwidth and delay which could have impact on downloading time, and it avoids the situation that multiple sessions occupies one link by means of limiting the total session number.

Conclusions
P2P is acting a more and more important role in Internet services, how to guarantee the stable running of Internet service when introducing P2P technology is particularly concerned by ISP.This paper makes comprehensive analysis on the advantages and disadvantages of existing P2P traffic optimization schemes, and proposes a P2P traffic optimization scheme based on dynamic network information awareness.Experimental results show that the scheme can guarantee the localization of P2P traffic, reduce crossing PID and AS traffic, and guide the P2P traffic ideally.Moreover, the scheme achieves lower latency and lower downloading time compared with other traffic optimization schemes, which maximally guarantee user experience.
However, our scheme still has some limitations.Firstly, the scheme does not take network cache into consideration to achieve traffic localization.Secondly, the scheme is only verified by simulation, and its performance in large-scale network remains to be tested.Therefore, it needs more study on the two points in the future work

Figure 2 .
Figure 2. System model of P2P traffic optimization scheme is based on dynamic network information awareness.

Figure 3 .
Figure 3.One example of PID division.

Five
ASes and 700 P2P nodes are built in the simulation environment.The simulation topology is shown in Figure 4. AS0-AS3 simulate four province ASes in the real network and AS4 simulates the backbone network.AS0-AS3 are connected to AS4 through core routers.700 P2P nodes are equably attached to the edge routers in AS0-AS3, which are constructed as a structured P2P overlay network through Chord protocol, storing and requesting resources randomly.What's more, AS0 and AS1 contain only one OSPF routing domain, which is divided into multiple areas by edge routers.AS2 and AS3 are divided into multiple OSPF routing domains by internal gateway routers, and each OSPF routing domain contains multiple areas.In simulation environment, each area connecting to edge router is regarded as a PID, so there are 11 PIDs in total.

Figure 5 .
Figure 5. Available bandwidth of PID export router.

Figure 6 .
Figure 6.Available bandwidth of AS export router.

Figure 8 .
Figure 8.Average downloading time of 200 Mb resource.

Table 1 .
Structure of network map (network region-aggregate address)

Table 2 .
Structure of link cost map.