Connectivity-Based Data Gathering with Path-Constrained Mobile Sink in Wireless Sensor Networks

The design of an effective and robust data gathering algorithm is crucial to the overall performance of wireless sensor networks (WSN). However, using traditional routing algorithms for data gathering is energy-inefficient for sensor nodes with limited power resources and multi-hop communication protocols. Data gathering with mobile sinks provided an effective solution to this problem. The major drawback of this approach is the time and path constraints of the mobile sink, which limit the mobile sink to collect data from all sensor nodes and, then, data routing is still required for these unreachable parts by the mobile sink. This paper presents a new data gathering algorithm called Connectivity-Based Data Collection (CBDC). The CBDC algorithm utilizes the connectivity between sensor nodes so as to determine the trajectory of the mobile sink whilst satisfying its path constraint and minimizing the number of multi-hop communications. The presented results show that CBDC, in comparison with the LEACH-C algorithm, prolongs the network life time at different connectivity levels of sensor networks, varying number of sensor nodes and at different path constraints of the mobile sink.


Introduction
As data aggregation is one of the primary tasks of Wireless Sensor Networks (WSN), sensor nodes with limited power resources and wireless communication range necessitate the need for energy-efficient data collection

Literature Review
Recent research in wireless sensor network highlighted the importance of using mobile sink techniques for data collection instead of using traditional data routing algorithms [5] [6].Based on the type and number of mobile sinks, data gathering techniques can be applied using either random, constrained or controllable mobile sinks [3] [10] [11].With random mobile sinks, sink nodes collect data from random locations in the sensor field.As a result of random mobility of sink node, data collection of low delivery ratio and high transfer latency is induced [12] [13].Mounting sensor nodes on wild animals for behavior monitoring, where these animal move in an unexpected manner, are examples on such type of techniques.
More recently and with the advancement of embedded system technologies, sink nodes are supported with controlling units and, sometimes, with geographical positioning system (GPS) services.This would enable the sensor nodes to move in specific and controllable paths.In this approach, sink nodes are required to periodically move throughout in the sensor field and collect data gathered from the stationary sensor nodes.An example on such type of approaches occurs when the mobile sink placed on a shuttle for public transportation [14].
Different approaches for the mobile sink with deterministic movement pattern have been studied in literature [15] [16].The path length and trajectory of the mobile sink are constrained by the energy resources as well as the data collection latency imposed by the underlying application or users.In literature, controlling the movements patterns of the mobile sink pave the way for more flexible design and an efficient data collection [3].For example, traversing the whole network represent the most energy-efficient solution.However this option is restricted by the data collection latency deadlines.In this case, a fixed trajectory is determined with lower latency at the cost of higher energy consumption, since an increased number of multi-hop communication is required.The algorithm presented in this paper is actively adopting the controllable and fixed trajectory of the mobile sink.
Communication paths between source nodes and mobile sinks are either single-hop [17] [18], when these source nodes located within the communication range of the mobile sink, or multi-hop as the source node lo-cated out of range of the mobile sink [10] [19].In the former case, no data routing is required.However an energy efficient data routing algorithms for those nodes with multi-hop communication is needed.In addition, the issue that when and how the multi-hop nodes can communicate with the mobile sink should be also addressed [3].To this end, we used the well-known Stop to Collect Data (SCD) scheme [20].The mobile sink, in this method, stops at a pre-determined location and wait for data collection.Moreover, a proxy-based routing algorithm referred to as Maximum Amount Shortest Path (MASP) [10] is implemented.The single-hop nodes are selected as gateways or proxy nodes to store data from source nodes and pass them to the mobile sink at an appropriate time.
On the other hand, considering data routing from source nodes to the mobile sink, many mobile sink communication algorithms were investigated [20] [21].These algorithms focused to prolong sensor nodes life times as well as to improve the data throughput of the sensor network.The problems of these protocols were the imbalance energy consumption of sensor nodes caused by the adopted Shortest Path Tree (SPT) algorithm.In [22], MobiRoute is suggested as a routing protocol for a path predictable mobile sink networks.With MobiRoute, the mobile sink holds for certain time at anchor points (i.e., collection points) to collect data from sensor nodes [22].
Other solutions to the data collection problem were based on using multiple mobile sinks (or mobile elements) [8] [19] [23].Authors in [23], for instance, suggested the Area Splitting Algorithm (ASA), which partitions the network into areas of a nearly equal number of sensor nodes.Then, one mobile element is assigned to each partition and entrusted with the data collection task in this partition.Rendez-vous Design (RD) approach is an example on using multiple mobile elements.This design was reformulated in [8] with more practical scenarios.For instance, assumptions of having several deadlines of data collection with constrained and fixed trajectories of mobile elements were considered.Although multiple mobile elements methods led to decrease the network latency, additional cost and network design complexity were produced.Hence, the current research addresses the data collection problem having single mobile sink with restricted paths.
This paper presents a clustering-based data collection algorithm, called CBDC.The proposed algorithm exploits the available information about nodes connectivity in order to maximize the number of single-hop sensor nodes at each collection point of the mobile sink, and therefore extends the life time of sensor nodes.Moreover, the proposed CBDC algorithm can be used to support multi-hop communication in sensor networks using Dijkstra algorithm with a new energy balancing scheme.

Problem Formulation
In this paper, data from N stationary sensor nodes need to be collected by a single node with unlimited buffer size known as mobile sink.As shown in Figure 1(a), sensor nodes are randomly and uniformly distributed in the sensor field with known locations.The mobile sink must follow a limited path through a set of collection points in order to satisfy its energy and time constraints.Nodes located within the range of the mobile sink, i.e., single-hope nodes, can send its data directly to the mobile sink.While other nodes, i.e., multi-hop nodes, must send their data to the mobile sink by using one of the existing routing techniques, such as Dijkstra algorithm.In addition, the single-hop nodes are used to forward the data of sensor nodes to the mobile sink and hence we also refer to these nodes as gateway nodes.Figure 1 illustrates an example on 1) Sensor nodes distribution within a sensor field including a sink node; 2) sensor nodes clustering showing single, multi-hop nodes and collection points (CP) and 3) a path of the mobile sink passing through all collection points.The main aspect that should be considered while designing data gathering algorithms is the network life time.The network life time can be defined as the time elapsed until the energy of the first node ran out [10].In this paper, the energy model for each sensor node is given as ( ) where P tx and P rx are the transmitting and receiving power, respectively.A is the data size and R is the data rate.
Notice that P rx is zero if the node is the source node.Obviously, E represents the energy consumed by a sensor node for one hop of data forwarding.As a total, a sensor node is consuming an energy that is given as ( ) ( ) where l i denotes the number of sensor nodes which send its data through sensor i. Intuitively, increasing the ratio of number of single-hop to the number of multi-hop node certainly leads to increasing the network life time, since the number of nodes that is required to adopt a routing technique is reduced.It is clear from Equation ( 2), the smallest the value of l, the minimum the energy consumption.Furthermore, as shown in Figure 1, for the node that is close to the path of the mobile sink, the value of l is larger than that for distant nodes.As a consequence, in addition to using the shortest path algorithm, an adaptive energy load balancing scheme is required to reduce the data forwarding overhead of such nodes and then reduce the total energy consumption.

The Proposed Algorithm
This paper represents a data gathering algorithm which attempts to reduce the energy consumption by maximizing the number of gateway nodes and appropriately balance the load of data routing among all sensor nodes.Our algorithm, which is referred to as connectivity-based data collection (or CBDC for short), mainly consists of three phases: clustering, path determination and data collection.

Clustering Phase
Initially, the sensor nodes are grouped into clusters based on its connectivity to each other.In this type of clustering, each cluster represents a mesh network where each node in a cluster must be located within the communication range of all other nodes in the same cluster.In other words, let G k represent a set of nodes, such that N k denotes the number of sensor in the set Gk, M is the number of clusters in the network and W is the connectivity matrix (such that w i,j = 1 if nodes i and j are connected; otherwise w i,j = 0).Figure 1(a) illustrates an example on sensor nodes clustering.It is possible that a sensor node may be appeared in more than one cluster.To avoid this scenario in CBCD algorithm, the sensor node will be removed from clusters of low N k and preserved with the cluster of high N k .This step is important for phase three in order to maintain clusters of unique and large numbers of sensor nodes.

Path Determination Phase
As already mentioned, the primary task of the mobile sink is to collect data from sensor nodes.Since the path length of the mobile sink is restricted due to the limited energy resources, it is difficult to visit all sensor nodes in the network.Therefore, the mobile sink must follow a specified path that satisfies the path constraint and maintains the maximum number of gateway nodes.For this reason, a set of collection points (CP) should be considered.The CP of each cluster could be any physical location within it, since each cluster represents a mesh or a fully connected network.In this algorithm, the centroid point of each cluster is computed and used as a CP for that cluster.Hence, the CP of cluster G k is given as The advantage of selecting the centroid locations of each cluster as a CP is to maximize the contact time3 between the mobile sink and source nodes.The Travel Sales Man (TSM) algorithm can be applied now to determine the trajectory of the mobile sink, starting by the initial location, passing through all collection points and ending up with the initial location, as shown in Figure 1(c).Visiting all clusters in the network perhaps results with a tour length that exceeds the prespecified path constraints.In this case, a tour reduction mechanism should be used as will be described in the next phase of this algorithm.It is clear that reducing the tour length of the mobile sink is necessary when computed path length is longer than the prescribed path length of the mobile sink.Minimizing such tour length is achieved by removing one or more CPs form this tour.All sensor nodes in a cluster for which the collection point belongs will be considered as multi-hop nodes.Hence, the question that which CP should be removed is carefully answered realizing two main considerations.The first one is that the removed CP must lead to reduce the tour length more than other CPs.The second point focused on increasing the total number of single-hope nodes by maintaining a cluster of larger number of sensor nodes.Both considerations are integrated with the CBDC algorithm in a single benefit function which is applied to all CPs, and then the CP with the minimum benefit value is removed.This is repeated until the path constraint of the mobile sink is satisfied.The benefit function used for this purpose is given as where k α denotes the difference between the tour length before and after removing the CP c k k β ⋅ represents the minimum value of the distance between c k and 1 c k + , and c k and 1 c k − , assuming that the optimal path is given as c 0 , c 1 , c 2 , ..., c M , c 0 , where c 0 is the location of the sink node.
Obviously, the benefit function is increased for a small value of NK or a large value of k α .Remember that small value of k N means small number of cluster members.Since removing the CP of maximum benefit func- tion δ from the optimal path would consequently reduce the total tour length of the mobile sink.This step is repeated continuously until the path constraint of the mobile sink is satisfied.The flowchart of the proposed algorithm is shown in Figure 2.

Data Collection Phase
As already mentioned before, each of multi-hop nodes needs to find its path to the closest CP via one of the gateway nodes.In this paper, this target is achieved using the well-known Dijkstra algorithm [24].With this algorithm, the optimal route from a single node (i.e., the source node) to a single destination (i.e., the gateway node) is found.Nevertheless, some gateway nodes would be heavily incurred for packets relying more than its neighbours of gateway nodes.Taking the advantage of having within each cluster a set of gateway nodes instead of only one, such a load could be fairly distributed among these nodes when an energy balancing technique is implemented.In this regard, a threshold function is introduced with the CBDC algorithm that restricts the number of packets forwarded by one gateway node.The threshold function, say h k , of cluster k is computed as where N G represents the total number of gateway nodes, such that 1 It is worth noting that Equation ( 5) represents an adaptive threshold value which varies from cluster to cluster based on the number of gateway nodes in the cluster itself.For example, a cluster of a large number of gateway nodes will have less h k than a cluster of smaller number of gateway nodes.Figure 2 shows the flowchart of the CBDC algorithm proposed in this paper.
Once the data collection round starts, the mobile sink begins to move from the sink node location passing through the selected collection point with speed of q m/s.Hence, l q seconds is required to complete a single approaches are suggested to organize data collection and to prevent out-of-synchronization problem.
In the first one, sensor nodes are designed to send its data every l q seconds.The problem of out-of-syn- chronization still occurs when for instance the mobile sink is slowed down or even delayed somewhere in the network.In the second approach, Stop to Collect Data (SCD) algorithm [20] is used.Hence, once the mobile sink arrived at the CP of a cluster, it sends a data request to sensor nodes through one of the gateway nodes belonging to this cluster.This request can pass through the route designed for data collection.Then nodes reply with the requested data through the same route.

Performance Evaluation
In this section, we study the performance of the connectivity-based data collection algorithm (CBDC).The network life time of this algorithm was evaluated for different path constraints of the mobile sink and at varying number of sensor nodes.In addition, the efficiency of our algorithm is examined for networks of different levels of connectivity.

Simulation Scenario
In this simulation, sensor nodes are randomly and uniformly deployed within 400 × 400 meter square area.
Transmission rate of all sensor nodes including gateway nodes is set to 200 Kbps.The mobile sink moves at speed of 1 m/s.The initial energy of each node is assumed to be 10 Joules.The transmission power and tion power are set to 21 and 15 milliwatt, respectively.In this experiment, energy consumption is considered for data transmission and reception only; while during other time periods, sensor nodes are supposed to be in the sleep mode for which no energy consumption is assumed.
In our simulation scenario, the connectivity of sensor nodes is measured as [25] , 1 1 2 connectivity 100% Notice that w i,j mostly depends on the Received Signal Strength (RSS) sensitivity threshold RSS th 4 of the sensor node, or equivalently the communication range d th .Since network connectivity is an RSS strictly dependant parameter, it cannot be readily controlled.For this reason, in this simulation we used a range of RSS th values and records the corresponding network connectivity values.Table 1 lists the RSS th , the corresponding d th and network connectivity.

Scalability of Sensor Nodes
An important factor that should be considered while designing a sensor network is the scalability, which measures how efficient a WSN is for small and large numbers of sensor nodes.The scalability of CBDC algorithm is evaluated by testing the algorithm at varying number of sensor nodes (from 100 to 500 nodes).The simulation is repeated for mobile sinks of different path constraints (i.e., P th = 800, 1000, 1200 and 1400 m).The number of collection points (CPs), the percentage of gateway nodes and the network life time are shown in Figures 3(a)-3(c), respectively.The network life time is computed as the time elapsed until the energy of the first node in the network ran out.
Generally speaking, when the number of sensor nodes increased larger number of clusters with fewer members (i.e., gateway node), is formulated.This, in turn, leads to increase the number of CPs and decrease the percentage of gateway nodes in the network; particularly for mobile sinks of long path constrains (e.g., 1200 and 1400 m), as shown in Figure 3(a), Figure 3(b), respectively.As a result of reducing the percentage of gateway nodes, the percentage of multi-hop nodes is increased and therefore the network life time is shortened, as shown in Figure 3(c).It is also clear from this figure that the increase of the mobile sink path constraint leads to prolonging the life time of sensor nodes.This is due to the reason that increasing the path constraint allowed the mobile sink to t area in traverse larger area of the sensor field and hence reduced the number multi-hop sensor nodes.Another trend can be seen from this figure is that for longer path constraints, the network life time is highly dependent on the number of sensor nodes compared to that for shorter path constraints.Consequently, these results emphasize the scalability of the CBDC algorithm in spite of using a mobile sink of limited path constraints.RSSth value, the packet will be discarded.Note that the corresponding communication range dth is computed as 0 0 10 10 where d0 is the reference distance, P0 is the reference power and ν is the path loss exponent (in this paper we used d0 = 1, P0 = −40 dB and ν = 2.3 [26]).

Connectivity of Sensor Nodes
Next, we study the performance of CBDC algorithm as a function of network connectivity.It is already mentioned that the network connectivity percentage is varied based on the selected value of RSS th , as given in Table 1.In contrast to the previous results, it is apparent from Figure 4(a), that sensor networks of high connectivity include less CPs than sensor network of low connectivity.This, as an effect, increases the number of gateway nodes in these networks as shown in Figure 4(b).As the path constraint increased the percentage of gateway nodes (single hop sensor nodes) is increased and hence the network life time is exponentially increased, as shown in Figure 4(c).In the same manner, extending the network life time is actually restricted by the path constraint of the mobile sink.It can be also observed that networks of weak connectivities almost have the same life time irrespective to path constraints even though using long path constraints, e.g., 1400 m.Unsurprisingly, the effectiveness of using long path constraints is much clearer at high network connectivity, which results with networks of extremely long life times.Notice that a connectivity of 100% yields a network with only single CP located within the communication range of all sensor nodes and, hence, no multi-hop routing protocols are needed.

Energy Balancing Technique
The importance of using the energy balancing technique, considered in this paper, for the CBDC algorithm is assessed through Monte Carlo simulation.The network life time as a function of increasing number of sensor nodes with and without the adopted energy balancing technique is shown in Figure 4(d).Obviously, the performance of CBDC with the proposed energy balancing scheme is substantially improved at different numbers of sensor nodes.This is due to the reason that with this scheme the energy consumption is appropriately distributed among the gateway nodes, rather than using Dijkstra algorithm solely for multi-hop data routing.In this section, the performance of CBDC algorithm is compared with the LEACH-C algorithm [9].In this experiment, each sensor node is required to send as a total of one MByte of data to the sink node.The maximum message size is set to 100 Bytes that should be transmitted every 6 seconds.The communication energy parameters, as given in [9], are: E elec = 50 nJ/bit, ϵ fs = 10 pJ/bit/m 2 , ϵ mp = 0.0013 pJ/bit/m 2 and the energy for data aggregation is set as E DA = 5 nJ/bit/signal.For consistent comparison, the simulation parameters listed above were used for the two algorithms alike.Thereafter, 200 independent simulation runs were executed for both algorithms.At each simulation run, the network density is kept constant at 0.25 node/m 2 and the sensor nodes were randomly and uniformly distributed in 0.25  emphasizes the efficiency of CBDC algorithm at small number of sensor nodes; while at high number of sensor nodes the two algorithms obtained almost the same performance.On the other hand, the total number of dead nodes after 10 4 seconds of simulation time is elapsed is shown in Figure 5(b).Each sensor node in this experiment is required to send one mega bytes of data to sink node.It is apparent from this figure that larger numbers of sensor node died for LEACH-C algorithm in order to send this size of data.This number of dead nodes is approximately linear with respect to the number of nodes.In contrast, CBDC completed its task with lower number of dead nodes.Furthermore, the number of dead nodes is slightly increased with the increase in the number of sensor nodes.Next, we studied the performance of both algorithms in term of the total energy dissipation in the system and showed the result in Figure 5(c).Obviously, a significant saving of the total energy is achieved in the case of CBDC algorithm in comparison with the LEACH-C algorithm.Moreover, with the CBDC algorithm, networks of larger number of sensor nodes resulted with a further saving of total system energy.For example, 50% of the total energy is consumed at 100 sensor nodes, while 35% is only consumed for 500 sensor nodes.On the other hand, LEACH-C consumed about 80% and 70% of its total energy for 100 and 500 sensor nodes; respectively.The high energy consumption of LEACH-C algorithm is due to the dynamic transmission scheme used with this algorithm, since the transmission power model is proportional to the transmitter-receiver separation distance [9].With the CBDC algorithm, however, a constant transmission power is used regardless of the transmitter-receiver separation distance.If a receiver node is not located in the direct communication range of the transmitter node, then a multi-hop communication is used with the energy balanced scheme discussed in this paper.

Conclusions and Future Work
In this paper, an energy efficient data gathering technique, referred to as connectivity-based data collection (CBDC) algorithm, is presented.The CBDC algorithm focused on reducing the number of multi-hop communication without violating the path constraints of mobile sinks.In addition, an energy balancing scheme is suggested for gateway nodes in order to distribute the energy consumption over these nodes.The simulation results presented in this paper showed that the proposed algorithm provided a desirable data gathering solution for networks of varying number of sensor nodes.Moreover, the CBDC performed well for mobile sinks of limited path constraints and energy resources.In comparison with the LEACH-C algorithm, CBDC conserved about 35% of the total system energy and hence extended the network life time substantially.
In future work, it would be worthwhile to use multiple mobile elements rather than using only a single mobile sink, in order to address the problems of buffer overflow, data latency and improve the data delivery ratio.Meanwhile, it is also necessary in this case to explore whether the potential benefit obtained would justify its additional requirements.

Figure 1 .
Figure 1.Examples on (a) Sensor nodes distribution within a sensor field including a sink node; (b) Sensor node clustering showing single, multi-hop nodes and collection points (CP); (c) A path of the mobile sink passing through all collection points.

Figure 2 .
Figure 2. Flow chart of the connectivity-based algorithm discussed in this paper.

Figure 3 .
Figure 3.The impact of increasing number of sensor nodes on the performance of CBDC algorithm for different path constrains and in terms of (a) Number of points; (b) Percentage of gateway nodes of the total number sensor nodes; (c) Network life time in minutes.

Figure
Figure The performance of CBDC algorithm as a function of network connectivity and for different path constrains and in terms of (a) Number of collection points; (b) Percentage of gateway nodes; (c) Network life time in minutes; (d) The network Life Time of CBDC algorithm as a function of increasing number of sensor nodes with and without using the load balancing scheme proposed in this paper.

Figure 5 .
Figure 5. Performance comparison between LEACH-C and CBDC algorithms as a function of increasing number of sensor nodes at constant density of 0.25 node/m.The performance is evaluated in terms of (a) Network life time; (b) Number of dead sensor nodes; and (c) Total energy dissipation in the system.

Table 1 .
Typical RSS sensitivity values RSS th , the corresponding communication range d th and the connectivity percentage used in the simulation experiments.RSS sensitivity is a threshold value specified by the transceiver circuit of the sensor node.If the RSS of the received packet is less than the 4