A New Clustering Protocol for Wireless Sensor Networks Using Genetic Algorithm Approach

This paper examines the optimization of the lifetime and energy consumption of Wireless Sensor Networks (WSNs). These two competing objectives have a deep influence over the service qualification of networks and according to recent studies, cluster formation is an appropriate solution for their achievement. To transmit aggregated data to the Base Station (BS), logical nodes called Cluster Heads (CHs) are required to relay data from the fixed-range sensing nodes located in the ground to high altitude aircraft. This study investigates the Genetic Algorithm (GA) as a dynamic technique to find optimum states. It is a simple framework that includes a proposed mathematical formula, which increasing in coverage is benchmarked against lifetime. Finally, the implementation of the proposed algorithm indicates a better efficiency compared to other simulated works.


Introduction
Currently, sensor networks are employed in several areas, including military, medical, environmental and household uses.But in all these fields, energy is the determining factor for the performance of wireless sensor networks [1].Consequently, methods of data routing and transfer to the base station are very important because the sensor nodes run on battery power and the available energy for sensors is limited.A routing method with an optimum consumption of energy and the shortest path selection for data transfer in wireless sensor networks is desired [2].The main applications are for habitat monitoring, target tracking, surveillance and security [3,4].A WSN consists of a number of small sensor nodes used to entirely cover an environment; hence, the sensor nodes should be low cost, low power and have limited energy use.These nodes can communicate to each other across a short distance.WSNs may be deployed either randomly or deterministically, depending upon the application [5].Deployment in a non-hazardous area is generally deterministic while random placement is preferred in hazardous or battlefield environments.In general, random deployment requires more sensor nodes than deterministic deployment [6].
Generally, cluster based approaches are appropriate for monitoring applications that require a continuous stream of sensor data [3]; thus, routing protocols are applied to lower the cost of delivering a data packet on time.For instance, Heinzelman et al. [7] study the LEACH protocol, which is a hierarchical and self-organized cluster-based approach.The area under monitoring is randomly subdivided into several clusters in which CHs collect data from the associated member nodes in their clusters based on Time Division Multiple Access (TDMA) scheduling.Then, redundant data is removed, and the outcome is transmitted to the Base Station or sink as a data packet.After a pre-determined period of time, CHs are selected through a BS message.
Figure 1 shows a sample WSN with a series of red circles surrounded by gray circles.The red circles represent a sensor/node, and the surrounding green circle is the sensor detection range.There are several clusters that transmit aggregated data to the BS just through CHs, which are surrounded by gray circles.In this paper we optimize the network life time and energy consumption in WSN and finally propose a new clustering protocol by using genetic algorithm.
The remainder of this paper is arranged as follows: In Section 2 we review the last works of clustering approach as literature.Section 3 is a brief description of GA methodology concentrating on a WSN-based fitness function.Our proposed intelligent technique of GA-based clustering is presented in Section 4. Section 5 details the simulation and implementation.Also results are discussed in Section 5 and finally, Section 6 presents our conclusions and provides the direction of future projects.

Literature
Many studies are devoted to presenting algorithms in which the costs, including receiving and transmitting between CHs and BS, are reduced.Ghiasi et al. [8] presented theoretical work concentrating on the clustering problem in WSN in order to optimize energy consumption through optimal clustering of sensor nodes.Their algorithm creates clusters with uniform size so that the distance between sensor member nodes and CHs is minimized; this minimization helps reduce the cost of transmission energy [1,9].
Heinzelman et al. present a model for optimizing energy consumption, which is mentioned below.In this formula, it is supposed that an energy node needs "E T (i, j)" energy to transmit "l" bit of data within a given distance of node i to node j.
"E e " represents the amount of energy needed to activate electronic circuits for receiving and transmitting."d co " is the threshold value that "d 4 " is considered for long distance and also "d 2 " for short rang transmission such as within cluster [12].Moreover, " " s  =10 pJ/bit/m 2 and " " l  =0.0013 pJ/bit/m2 represent the energy consumed by the amplifier for transmitting short and long distances, respectively.Also, the required energy to receive l data bit is presumed to be E r = lE e in the receiver In 2002, Lindsey et al. [10] proposed the PEGASIS protocol, which was an extension of the LEACH algorithm.The advantage of PEGASIS is in the robustness of node failure compared to LEACH, while Pan et al. [11] presented a two-tiered structure in which more energy efficiency is provided by hierarchical clusters in certain locations.Kalpakis et al. [12] proposed the MLDA (Maximum Lifetime Data gathering Algorithm) to find edge capacities that allow maximum transmission by running a linear program.This algorithm is able to maximize the lifetime of a network with certain locations of each node and the BS.Dasgupta et al. [13] extended MLDA by applying a cluster-based heuristic algorithm called CMLDA, where nodes are grouped into several pre-defined sized clusters.The energy summation of cluster member nodes is their cluster's energy.The distance between clusters is computed by the maximum distance between every pair of nodes in two clusters.After cluster formation, MLDA is applied.Bandyopadhyay et al. [14] proposed a multi-level hierarchical clustering algorithm that utilizes stochastic geometry and leads to minimized energy consumption.Cerpa et al. [15] described the Adaptive Self-Configuring sensor Networks Topology (ASCENT), in which sensor nodes manage their own connectivity, deciding whether to be active and participate in multi-hop networking or to be passive until receipt of a request from active nodes.ASCENT can be used in any routing protocol in order to handle node redundancy because it operates between link layers and routing.In 2010, Jabari lotf et al. [16] proposed an efficient cluster based algorithm named MLCH to maximize lifetime.MLCH improves LEACH protocol by using a very equally distributed cluster and also decreasing the unequal topology of clusters that clusters are formed through radio range.It modifies the connection distance of the head-nodes with cluster heads by hierarchical tree.An early example of a GA algorithm is Turgut et al. work [17] which applied the GA concept to improve mobile ad-hoc network clustering.The proposed algorithm is the same as most of the GA based protocols in that it presents a fitness parameter which defines the destiny of an individual.Jin et al. [18] utilized GA to reduce energy consumption.This algorithm determines a primary number of pre-defined independently clustered chromosomes and then biases them toward an optimal solution with minimum communication distance.Simulations have shown CH reduction by approximately 10% of the total number of nodes.They also show that cluster-based methods reduce 80% of the communication distance, making it closer to the direct transmission distance.In 2005, Ferentinos et al. [19] improved the work done by Jin et al.They investigated utilizing a fitness function that involved the status of sensor nodes and network clustering with suitable cluster heads, as well as selecting between two signal ranges for normal sensor nodes.Ye et al. [20] studied SMAC with a contentionbased medium access algorithm, in which a virtual cluster agent reduces energy consumption.The researchers also applied common sleep schedules for the clusters and in-channel signaling in order to avoid collision.
In another direction, Hussian et al. [21,22] improved the hierarchical cluster-based routing (HCR) protocol, in which nodes self-cluster and are managed by the head set.Of head set associates, a node is selected to head the cluster and transmit monitored data based on the round robin technique.Later, Hussain et al. [23] extended their work using a genetic algorithm trick to obtain the optimum number of clusters, cluster heads and cluster members, as well as the transmission schedule.The proposed fitness function is based on parameters such as energy consumption, number of clusters, cluster size, direct distance to sink and cluster distance.In [3], they also worked on an improvements to HCR (HCR-1) called HCR-2, where they concluded that whenever more than 25% of nodes have died, protocols including LEACH and HCR-1 tend to get disconnected quite rapidly while HCR-2 survives because of fewer elections.Whereas GA utilize cross layer optimization, the energy consumption during reconfiguration is minimal.
In 2011, Norouzi et al. [24] proposed a new protocol called Fair Efficient Location-based Gossiping to address the problems of Gossiping.We showed how our approach increases the network energy and as a result maximizes the network life time with using GA.

Genetic Algorithm
A genetic algorithm is categorized as a global search heuristic algorithm in which an optimal solution is estimated by generating different individuals [24,25].This algorithm is comprised of procedures such as focused fitness functions.Below, the fundamental parts of a genetic algorithm are explained.

Initialization
Initially, the genetic algorithm begins with a primary population including random chromosomes that consist of genes with a sequence of 0 s or 1 s.In the next step, the algorithm biases individuals toward the optimum solution through repetitive processes such as crossover and selection operators.A new population can be pro-duced by two methods [26]: steady-state GA and generational GA.In the first case, one or two members of population are replaced, while the generational GA replaces all of the produced individuals at each generation.In this paper, the second method is adopted so that the GA keeps the specified qualified individuals from the current generation and copies them into the new generation as part of the solution.Other individuals of the new population are obtained by crossover and mutation functions.

Fitness
The fitness function is defined for the genetic algorithm as a scoring process to each chromosome according to their qualifications.This value is a trait for survival and further reproduction [26].The fitness function is severely problem dependent, so that for some problems, it is hard or even impossible to define.In nature, individuals are authorized to pass on to the new generation according to their fitness value, which determines the fate of individuals.

Selection
During each successive generation, a new population is generated by selecting members of the current generation to mate based on fitness.Fitter individuals are almost always selected, which leads to a preferential selection of the best solution.Most of the functions have a stochastically designed element to choose small number of less fit individuals to maintain the diversity of the population [24].Of the several selection methods, Roulette-Wheel is chosen to distinguish appropriate individuals with the following probability: where F i and 'n' are the fitness chromosome and the size of population, respectively.According to the Roulette-Wheel, each individual is assigned a value between 0 and 1.

Crossover
The main step for producing a new generation is the crossover or reproduction process.In fact, it is a simulation of the sexual reproductive process in that the inheritance characteristics are naturally transferred into the new population.To generate new children, crossover process selects a pair of individuals as parents from the collection determined by the breeding selection process.

 
This process will continue until the desired size of the new population is obtained.In general, there are various crossover operations that have been developed for different aims.The simplest method is single-point, in which a random point is chosen to divide the contribution of the two parents.Figure 2 shows an example of mating of two chromosomes in single point way.
Figure 2 represents two children that from a single set of parents.The bit sequence of the offspring duplicates one parent's bit sequence until the crossover point.Afterward, the bit sequence of the other parent will be replicated as the second part of children.

Fitness Parameters
The fitness of a chromosome represents its qualifications on the bases of energy consumption minimization and coverage maximization.Some important fitness parameters are described below: 1) Direct Distance to Base Station (DDBS): total direct distance between the whole sensor nodes and the BS, denoted by d i, is calculated as below: where 'm' is the number of nodes.As can be seen from the above formula, energy consumption logically depends on the number of nodes, such that it will be extreme for large WSN.On the other hand, DDBS will be acceptable for smaller networks with a few closely located nodes.
2) Cluster based Distance (CD): This parameter is the sum of the distances between CHs and BS, added to the sum of the distances between associated member nodes and their cluster heads. 1 1 where 'n' and 'm' are the number of clusters and the corresponding members, respectively.'d ij ' is the distance between a node and its CH, and 'D is ' is the distance between the CH and the BS.This solution is appropriate for networks with a large number of widely-spaced Parents Children nodes.The cluster distance will be higher, which results in higher energy consumption.In order to minimize energy consumption, the CD should not be too large [3].Using this measurement, the density of the clusters will be controlled, where density is the number of nodes per cluster.
3) Cluster-based Distance-Standard Deviation (CDSD): Standard derivation measures the variation of cluster distances, rather than one average cluster distance.CDSD is different depending on whether there is a random or deterministic placement of sensor nodes.In the case of random placement, there will be clusters of different sizes such that a SD within a specified variation in the cluster distance is acceptable.In this case, the differences in cluster distance can be non-zero, but this variation should be adapted based on the deployment information [6].However, in deterministic placement where node positions are uniformly distributed, the variation in cluster distances should be small.In general, variation in uniform cluster-based distances will indicate a poor network, unlike a similar result when the nodes are randomly placed.
In the following, µ computes the average of the cluster distances, which will be our standard SD formula for computing cluster distance variation.
4) Transfer Energy (E): This metric, E, indicates the amount of consumed energy to transfer all the collected data to the BS.Considering m-many associated nodes in a cluster, E is computed as follows: where e jm is the energy necessary to transmit data from a node to the corresponding CH.Therefore, the first term in the summation of 'i' is the total energy consumed in transferring the aggregated data to CHs.The second term in the 'i' summation shows the energy required to receive data from members, and finally e i represents the energy needed to transmit from the cluster head to the BS. 5) Number of Transmissions (T): Generally, the BS determines number of transmissions for each monitoring period.This measure is computed according to the conditions and the energy level of the network; consequently, a large T represents a long time stage for which only a superior optimum solution for maximization and an infe-rior solution for minimization can be accepted.The performance of previous GA-based solutions determines the quality of the best solution or chromosome.

New Algorithm
Learning algorithms including the genetic algorithm are used by many researchers to study network attributes such as clustering [17], energy consumption [18,24], determining of sensor nodes status and clustering with appropriate cluster heads [18], as well as for hierarchical cluster-based routing [6,21,27].We adapt genetic algorithm parameters based on software services to determine the energy consumption and therefore extend the lifetime of the network.There is a trade-off between energy consumption and distance parameters because making large numbers of clusters shortens the distance between the sensor member nodes and also corresponding CH.Any cluster has at least one CH; hence, many clusters have multiple CHs, which consumes much energy.In other words, creating many clusters increases energy consumption level rather than decreasing of distance.Because of this, we use the ratio of total energy consumptions to the total distances of nodes in order to achieve average amount of used energy for every node.Below, we propose a formula to achieve optimal WSN energy consumption and coverage.Moreover, ((e i *T)*(e j *T)) is the used total energies and ((D a *nodes)*(D b *CHs)) is the total distances between nodes of every cluster multiplying by total distances between cluster heads.Proposed F(i) tries to obtain maximum possible value of this ration.Creating many/a few numbers of clusters leads to increasing/decreasing "T", "nodes" and CHs as well as e i and e j versus of decreasing/increasing of D a and D b ; hence the maximum value of ratio operation is led to trade-off between energy consumption and number of clusters.Moreover, in variable "D a , we consider the width of area because of coverage problem.The best value of F(i) obtained by GA, is benchmarked by either width (regarding coverage problem) and e i and e j (regarding energy problem).
where "width" is the length of the target environment, and 'D a ' and 'D b ' show the distance between the sensor member nodes-corresponding to a CH and to CHs-BS, respectively.Constants 'e i ' and 'e j ' represent the energy needed to transmit data between member nodes and the CH and from the CH to the BS."F(i)" assigns a weight to every chromosome, both in a cluster-based method and a direct transmission method.Presuming D a =D b =#CHs=1, we offset the effects of these variables in the "F(i)" by applying Formula 9. On the other hands, Formula 8 is the multiplication of two terms in which the amount of energy necessary for e i and e j is multiplied by the number of transmissions per member nodes and CHs respectively.Generally, "F(i)" is our intelligent fitness function, which is able to score any chromosome, whether using a cluster-based or direct method.The best chromosomes are evaluated by a selection process to obtain the optimum solution through passing generations.Figure 3 shows a flowchart to illustrate the phases and execution of the simulated protocols.This simulation starts with the network setup phase, which sets initial values for a network with a pre-defined number of nodes and other constant values, which are considered in Formula 8.Each node is assigned an x and y location and initially has 2 Jules of energy.The decision step compares the attributes of surviving nodes with the minimum nodes condition.A living node must have met certain minimal conditions, such as enough energy for 'T' transmissions.Obviously, since a node with higher required conditions would be validated for another longer round of monitoring, the algorithm selects most of all lower scored ones.A minimal node value also provides the effect of a network administrator of sorts, i.e., the hazardous or amicable environment combined with the administrator determines the minimum node value.Creating many/e few number of clusters, it leads to increase/decrease 'T', 'nodes' and CHs as well as e i and e j versus of decreasing of D a and D b ; hence the maximum value of ratio operation leads to trade off between energy consumption and number of clusters.Moreover, in variables 'D a ', we consider the width of area is regarding coverage.The best value of F(i) obtained by GA, benchmarked by either width (coverage) and energy (e i and e j ).The next step is cluster formation, in which every cluster is managed by a CH.Our GA-based algorithm was used to create clusters at the BS.During this step, each cluster operates based on a TDMA schedule to ensure that sensors activate their radios only when they need to transmit a packet of data, otherwise they keep their radios off.
The next step in the election phase inquires about the receipt and transfers of data by sensor nodes.As mentioned earlier, sensor nodes transmit data packet of aggregated information from the environment to the head of the cluster.The CHs process the received data and re- transmit them to the BS.In next phase, all intermediate activities will be logged.This log contains the energy level of the nodes, the total number of transmissions and number the number of nodes that are alive.This cycle continues until the number of live nodes is insufficient to transmit data.

Simulation and Evaluation
The GA-based approach presented herein is compared with other cluster-based protocols such as LEACH [7].The experiments use 200 nodes (N), a network area 100*100 m 2 , denoted as M, and the BS is 200 m away from the network.The length of the chromosome represents the number of clusters.In DDBS: second term=1 Table 1 shows the simulation parameters .The LEACH routing process() implements the LEACH protocol, in-cluding the election process().In this comparison, all clusters have only one CH, and the number of CHs is obtained from the described genetic algorithm process.The more generation rounds; the much better solution.The BS controls the formation of clusters according to the GA-based algorithm.In the next period, the fitness function identifies qualified individuals on the basis of their currently reported energy level.On an absolute scale, the results will differ from other periods, because the energy consumption of one period.
Table 2 shows the GA parameters used to simulate the environment.The candidate chromosomes can be chosen randomly because this selection does not affect the final results, i.e., any candidate individuals will tend toward the optimum solution.The number of iterations is constant at 100.
Figure 4 represents a sample result of our algorithm.Because the distribution of CHs is more unified, it is highly probable that we can achieve a more balanced consumption of energy.Figures 5 and 6 compare the proposed algorithm to the LEACH protocol in terms of network energy and network lifetime, which is considered for 200 periods of time (years).In Figure 5, the unified consumption of energy by CHs makes for a short node lifetime in the LEACH protocol.Figure 5 represents the removal of the first node because of energy status, or else the death time of first node is postponed as compare to LEACH protocol.Also, the network can be functioning as long as the minimum numbers of nodes are alive.Generally, due to using an algorithm fitness function that considers the energy status of nodes and the distance between CHs and  the BS, the final individuals provide a cluster formation that uniformly consumes energy.This phenomenon significantly extends the lifetime of the network.
In 2010, Jabari Lotf et al. proposed MLCH wich has great impact in contrast Leach algorithm .They used diffrent number of members in cluster based with 100 and 200 alive nodes and time duration for simulation 1000 sec.We summarize the best result in Table 3.We consider totally that it works fine manner in life time parameter than MLCH and Leach.Number of members are 15.The show results are the average for 50 and the number of members is 15

Conclusions
Our proposed intelligent energy-efficient clustering algorithm performs better than some traditional cluster-based protocols.The simulation diagrams indicate that using a GA-based cluster formation algorithm extends the lifetime of the network through equally distributed clustering.This algorithm makes a trade of between energy consumption and distance parameter.Sometimes, we need multiple cluster heads to manage the corresponding cluster.Future work might include cross layer optimization using query and routing strategies [3].Furthermore, this work might include the addition of multiple communications between cluster heads to solve problem of simultaneous sending and receiving data.Creating 1000 of nodes and sending data simultaneous is difficult and one of the resolutions can be use see CSMA/CA instead of TDMA [16].

Figure 1 .
Figure 1.A sample of cluster based WSN.

FirstFigure 2 .
Figure 2. Single point method at random point 6.

Figure 3 .
Figure 3. GA based flowchart of the presented algorithm.

Figure 4 .
Figure 4. Simulation result in a selected environment.

Figure 5 .
Figure 5. Energy Consumption rate over the lifetime of a network.

Figure 6 .
Figure 6.Comparison of live nodes in two methods.