Modeling of Data Reduction in Wireless Sensor Networks

,


Introduction
In many applications, it is anticipated that wireless sensor networks (WSNs) will be composed of a large number of stationary sensor nodes randomly deployed on a large terrain to monitor an environmental phenomenon.The sensors are expected to be very basic devices having limited computational power, transmission range, data storage, and energy resources.Thus, it is expected that the nodes will need to work cooperatively in order to deliver the collected data to an information sink where it can be accessed by the end user.
In the literature, there are many examples of algorithms designed to collect the node data in such a way that the energy consumption in the network is reduced.Fundamentally, all of these algorithms seek to find a suitable way to limit the size of the data load (i.e. the total number of nodes which attempt to forward their data to the sink).We now discuss several of these algorithms.
In [1], the authors use the idea of representing a spatial phenomenon with contour lines to reduce number of nodes which transmit sensor data.They describe a fully distributed method for forming the contour lines where the contour lines are constructed from the node data at the sink.In [2], the Tiny Aggregation (TAG) service is presented.This scheme uses SQL-based aggregation queries to reduce the size of the data set.The aggregation process is performed in-network with aggregates computed on the data as it flows between the sensor nodes towards the sink.Irrelevant data is discarded and relevant data is combined into more compact records whenever possible.In [3], the Clustered Aggregation Technique (CAG) is presented for collecting data in a WSN.This algorithm reduces the size of the data set by using queries which exploit the inherent spatial correlation of the data set.This protocol forms clusters of nodes with values within some threshold of one another and selects a cluster head.In each cluster, only the cluster head reports its value to the sink and therefore the algorithm is lossy.The authors use simulation to show that a modest tradeoff in accuracy of the data using the CAG algorithm provides huge savings in terms of energy over the TAG algorithm discussed earlier.In [4], the Power-Efficient Gathering in Sensor Information Systems (PEGASIS) data aggregation algorithm is developed.The authors not only address the issue of energy savings, but also achieving low latency in data delivery to the sink.One of their goals is to find a convergecast structure which can achieve low energy consumption and low latency.Unlike TAG and CAG which use a tree structure to collect data from nodes at the sink, PEGASIS uses a chain-based structure.One node in each chain is selected as a leader.
Data collection occurs along the chain with the aggregate value returned to the leader node.The leader node then transmits the aggregated data to the sink.
To our knowledge, the existing literature is devoid of research that attempts to determine an analytical model for the size and spatial distribution of the data load in a sensor network.Thus, while the existing literature addresses questions relating to how to reduce the size of the data collected in a WSN, there are no models that estimate the potential size of the reduced data load.The existing literature relies solely on simulation results to estimate this size.Since simulation faces severe complexity problems in large scale WSNs, there is clearly a need for analytical models which capture the size of the data set.Since some data in a WSN may be more critical than other data, we will be primarily interested in the size of the data set when only the most critical data is transmitted to the sink.In this respect, the present work will compliment much of the existing literature devoted to developing various algorithms for data gathering in sensor networks.
In this work, we model the node data in a sensor network as samples taken from an underlying Gaussian random field.This random field is used to describe the space-time behavior of the phenomenon being observed.Random field models are popular in modeling large-scale environmental phenomena that exhibit correlated random variation in time and space [5,6].We then use the concept of high level excursion sets from random field theory to study the size and spatial distribution of the set of nodes that experience statistically "extreme" data in the network.Statistically extreme data contains information on where the state of the phenomenon is most critical and in many applications we can imagine that it is this data which is most important to the end user.It is shown that the nominal data load experienced by a sensor network can be significantly reduced if only statistically extreme data values are transmitted to the sink.Then, the notion of a high level excursion set is extended to define contour lines of the random field on the network deployment.It is shown that if we only transmit data from nodes that are close to these contour lines, then, the data load in the network can be further reduced.
The remainder of this paper is organized as follows.In Section 2, we introduce random fields and present our model for the node data in a WSN.In Section 3, we present results related to the average data load in a sensor network and show how the theory of high level excursion regions of Gaussian random fields may be used to model the extreme data load in the network.In Section 4, we develop a performance model for a large scale WSN.In Section 5, we present numerical and simulation results regarding the analysis of the total data load and the network performance.Finally, Section 6 contains the main conclusions of the paper.

Random Fields and the Underlying Phenomenon
In this work, we use random field theory to describe the behavior of the underlying phenomenon being monitored by the network.A random field is simply a multidimensional generalization of a one-dimensional random process.In the physical sciences, random fields have been used to model phenomena in such diverse areas as forestry, geomorphology, geology, turbulence, and seismology [6].Consider the following phenomena that are amenable to modelling with random fields: Depth of snowfall across a surface during a snowstorm; pollutant concentration in a lake; shear stress along a fault line in the earth; height of the ocean surface; amount of recoverable solar energy; areal density of the population of a species; agricultural crop yield; distribution of rainfall on a crop; inflow of water into a reservoir; density, porosity, and permeability of soil; intensity of an earthquake [5].Many physically occurring phenomenon exhibit Gaussian or nearly Gaussian behavior [5,6] and, we will therefore assume that space-time behavior of the underlying phenomenon is described by a Gaussian random field.We assume that the network has been deployed to monitor an environment on a subset of two-dimensional space, , with area denoted by , n   , where n is some arbitrary integer.The covariance function between the data at two points where   , m t s denotes the mean value of the random field at location S  s at time .

 
0, t   Many phenomena are modeled as being separable in time and space [5,7], which allows the covariance function to be expressed as the product of two independent functions, C s s and are spatial and temporal covariance functions respectively.We note that a function is only admissible as a covariance function if it is positive definite.We will consider both separable and non-separable models in this work.

T i j C t t 
We will assume that the covariance function is isotropic in space.This means that the spatial covariance function may be represented as a function of the Euclidean distance between points i s j s which we denote by , i j , S   .
then have that: We Finally, it will also be assumed that the random field is mean-square differentiable in space.A necessary condition for a Gaussian random field to be mean-square differentiable is that exists and is finite at , 0 i j   [5].In practice, very few space-time phenomena exhibit this property.However, in [5] it is shown that if even a small amount of local averaging is performed on the signal from the underlying random field, then the resulting locally averaged field will be meansquare differentiable.We note that local averaging may be already "built-in" to the nodes in a WSN.It is expected that the nodes will have some sensing radius, r, and we posit that the value that the node stores in memory will often be obtained by averaging over the region of radius r centered about the node's position [8].
Among the common spatial covariance functions [9] that are mean-square differentiable, is the rational quadratic function given below, where 1 2 0, 0 The parameter 1  controls how fast the spatial covariance dies with distance and the parameter 2  controls the smoothness of the sample paths of the random field.Higher values of 1  correspond to stronger spatial correlation of the random field.This spatial covariance function has the desired property of mean-square differentiability discussed above.
In [9], a common stationary temporal covariance function is given.Let be the time difference between two samples.Then this covariance function is given by, In [10], the author argues that one setback of separable covariance models is that they suffer from a lack of smoothness away from the origin, ( 0, where is the modified Bessel function of the second kind of order t     and , ,    are positive constants [10].

Model for Sensor Network Data
We first assume that N static sensor nodes have been deployed uniformly on .The density of the node deployment is therefore, . We represent the position of each node with a two dimensional vector We then assume that the sensor network takes samples of a continuous Gaussian random field at the locations of the sensor nodes at discrete times, , , , , , , We form the joint vector of node data at all sampling instants as: X also has a joint Gaussian distribution which will now be determined.We construct the vector of the means at each node at the sample instants as: where is the vector of node means at sampling instant k , ,K   .The covariance between the data at two nodes at two sampling instants 1 2 is determined from the particular covariance model of the underlying random field. , We form the N N  matrix of covariances between the data at all nodes at times as: We then form the super covariance matrix W as: The author proposes several non-separable covariance models among which is an isotropic, stationary, space-time covariance model which makes the underlying phenomenon Markovian in time.Let two points in the field be separated by distance , i j  and be sampled at times seconds apart.For a random field on this model is then given by: We can then write the joint distribution of X as: where .Unless it is otherwise stated, we will assume throughout this work that the mean value of the field is constant at a given time across so that ,   0, t   Clearly, in the simulation of sensor network data generation according to the full joint distribution has dimensionality drawbacks.In cases where the covariance structure gives rise to node data that is Markovian in time as in (6), the dimensionality will be reduced.At a given time instant we have that: Therefore, the data at an arbitrary sampling instant can be generated based solely on the node data at the previous sampling instant and does not require knowledge of the node data at all previous sampling instants.Now consider the vectors of node data at two consecutive sampling times, and .In [5], it is shown that the conditional distribution is also Gaussian with p.d.f given by: This distribution is determined by the conditional mean vector and conditional covariance matrix given by: where k and is given by (8).It is said that when the vector is actually observed as


, then the observed vector can be substituted into (11), and (12) in place of k  therefore permitting an updating of the prior p.d.f. by the posterior p.d.f.
This completes our representation of the data in a WSN.

Applications of Excursion Regions of Random Fields
In the following analysis, we seek to determine the spatial distribution of the data load and the global average data load in a sensor network when nodes forward their packets to the sink only if their sensed value is greater than some threshold, b.The analysis in this section, without loss of any generality, assumes that the underlying isotropic Gaussian random field has been standardized to have zero mean and unit variance.

Spatial Distribution of Data Load
As discussed earlier, it will be necessary to limit the number of nodes which attempt to transmit their data packets to the sink in a large-scale WSN.As well, we noted that the sink will often be most interested in where the state of the phenomenon is most severe or extreme.
The random variation of a phenomenon in space and time can lead to occurrences of values with significant deviation from the expected value of the phenomenon.These values are called "extreme values".A central topic in random field theory is the analysis of the size (i.e.area) of the regions in the plane where a random field exceeds an extreme level.We will use the theory of the sizes of these extreme regions in order to develop a spatial understanding of the data load in a sensor network.Since it often the extreme data values which are most important to the end user, we will assume that a sensor node only attempts to transmit its data packet to the sink if the value it observes from the underlying field is above some high level, b.

Let us denote by , i b
A the area of the i'th isolated region on the network deployment area, where the value of the field exceeds the level .We will call these regions "excursion regions".In [5], the average size (ie.area) of an isolated excursion region for an isotropic mean-square differentiable Gaussian random field is given by: , correspond to the directional derivatives of the field in the direction of each of the axes in 2-dimensional space and depend on the degree of spatial correlation.For a spatially isotropic random field,   are equal to one another and are given by, For the rational quadratic covariance function, these derivates can be computed for (4) as . We note that as the degree of spatial correlation increases, the average area of the isolated excursion regions given by (13) also increases.
Our assumption that the nodes have been uniformly deployed implies that the number of nodes on an isolated excursion region is governed by the spatial Poisson process with parameter given by the density of the node deployment, . We call a node whose position lies within an isolated excursion region for a given level b at a sampling time, an "excursion node".We note that by virtue of their location, excursion nodes necessarily have a data value exceeding the level b.Let , i b K denote the number of nodes on the excursion region for extreme value level b.We have that: where we have approximated the area of an excursion region with its expected value (13).The average number of nodes on the excursion region is given by: ' i th , , , In [5], it is further shown that as the level b increases, the number of isolated excursion regions approaches a spatial Poisson process with the parameter given by: From ( 17), the rate of isolated excursion regions in the plane, b  , is inversely proportional to the average area of an isolated excursion region.Therefore, as the degree of spatial correlation increases, the frequency of the excursion regions in the plane decreases proportionately.Expressions (13) and (17) imply that when the spatial correlation of the underlying phenomenon is high, the nodes that observe extreme values will tend to be located on a relatively small number of large excursion regions.On the other hand, for low value of spatial correlation, the nodes that observe extreme values will tend to be located on a large number of small excursion regions.The preceding comments offer great insight into the spatial distribution of excursion nodes (equivalently, the data load) on the network deployment area.
A natural question arises, how high must the level b be taken so that (13) and (17) hold.In [5], it is recommended that b be at least twice the standard deviation of the random field.An excellent study on the accuracy of these expressions for non-asymptotic levels is found in [11] where the authors look at a random field model describing the salt-induced delamination of a concrete slab.

Average Global Data Load
In this section, we derive an expression for the total average number of nodes which observe data above an arbitrary level, b, and therefore attempt to forward their data to the sink.
We define the global excursion area as the total area of the network deployment where and denote this area by b  .In [5] it is shown that for a region with area 0 , the average area within 0 that exceeds b is given by when the random field is spatially homogeneous.Since we have assumed that the underlying phenomenon is spatially isotropic (which implies homogeneity, see [5,7]), we have that: is the cumulative distribution function (CDF) of a standard normal random variable.We note that this expression only depends on the level of the threshold b and is independent of the spatial correlation of the random field.
Expression (18) will allow us to determine the total average data load experienced by a WSN in response to the sampling of the phenomenon for an arbitrary value b.Suppose that a sensor node only attempts to send its own data packet to the sink if the data value in this packet exceeds the level b.Denote the total number of excursion nodes in the network by b K .Since we have assumed that the sensor nodes have been uniformly deployed on S with density  , the average number of excursion nodes in the network (equivalently, the average number of data packets or data load) at a sampling time is given by: This last expression shows that the total average number of excursion nodes is the same as if the data at the nodes were i.i.d.This obviously follows from the invariance of b  to the degree of spatial correlation.Thus, the spatial distribution of the data values according to the covariance model of the random field has no bearing on the total average data load.In this respect, this expression describes the data load only on a global scale over the whole network and does not capture information on the location of the excursion nodes themselves.If the level b is extreme, then the results from Section 3.1 can be used to gain insight into the spatial description of this total data load at various regions throughout S will often be needed in large-scale WSNs.
An ability to describe the spatial distribution of the data load (as in Section 3.1) will provide insight into local contention for the channel, the spatial distribution of newly generated data packets, and the spatial distribution of energy consumption.The utility of our expression for the average global data load in (19) is useful from the perspective that all node data must pass through the information bottleneck that occurs around the sink in a WSN [8].Many characteristic performance measures will be strongly influenced by the behavior of the network in the region closest to the sink and so, (19) describes the average data load that the nodes in this region must bear.Thus, (19) will occupy a fundamentally important role in assessing the performance of the network.

Contour Nodes
In many WSN applications, it may be sufficient to represent the phenomenon with the data that belongs to a set of discrete contour levels 1 2 , , , J b b b  in the plane.We note here that it may be assumed that the nodes implement some form of quantization of the values they sense from the underlying random field.Since the sensor nodes have limited storage capacity, we assume that a quantization of the continuous values of the node data is performed when digitizing the analogue signal.For j  1, , J  δ , let us define a quantization region of half-width around each level j b , where 0 We will then assume that if the value of the underlying random field at a node location is in the range j j  , then the value stored at the node is . Thus, we can think of all nodes whose values get quantized to a level j b , as belonging to a contour line associated with this level.We will call such nodes "contour nodes".
In [7] it is conjectured that for Gaussian random fields, if the mean and covariance functions are both continuous, then the sample paths of the random field are almost certainly continuous.Consider a single contour quantization level b.Suppose we consider the areas , δ i b A  and , δ i b associated with the excursion regions for the levels and respectively.The continuity of the sample paths assures us that each isolated excursion region for level is contained within a corresponding excursion region for level .Due to the quantization of the sensed data, the number of nodes "on" the contour line therefore corresponds to the number of nodes between the boundaries of the two excursion regions.Let , i b denote the number of contour nodes that belong to the i'th contour line.The above comments imply that: Let b denote the average number of nodes that form the contour line for level b over the entire network.From the discussion above, we have: where

K
are given by (19).

Extension to Non-Stationary Phenomenon
In this section, we give an extension of the preceding results to scenarios where the Gaussian phenomenon exhibits a form of spatial non-stationarity.Many naturally occurring phenomena can be characterized as oc-curring at an epicenter and affecting points in the surrounding environment in inverse proportion to their distance to the epicenter.We will call such phenomenon "point-source" type.We model phenomena of the pointsource type by introducing a location dependent mean at each point in the environment surrounding the epicenter.
Recall our earlier observation that when the mean of a spatially correlated phenomenon is constant across , the average number of excursion nodes in a WSN can be computed as if the data were i.i.d.An analogous result holds when the mean is allowed to vary across for fixed time.In this case, the average number of excursion nodes can be computed as if the data were independent but not identically distributed.Denote the data value at the node . We assume that each has a mean value given by   and variance .Using the previous comments, we have the following for the number of excursion nodes at any time: , are independent standardized normal random variables.The accuracy of (22) will be demonstrated later through simulation results.

WSN Performance Model
In this section, we construct a performance model of a WSN using a contention based MAC protocol in order to study the performance of the network when only excursion nodes transmit their data to the sink.We seek to construct a first-cut model that's usefulness lies in its simplicity and lack of dependency on the specifics of the MAC and routing protocols.Next, we state our basic assumptions:  The time is diveded into slots of duration  seconds.
 The nodes collect data at discrete time instants every t  seconds which will be assumed to be an integer multiple of slot duration,  .After each sampling time, only excursion nodes attempt to transmit their data to the sink. Each excursion node will encapsulate its sensor reading into a single packet of fixed length.Non-excursion nodes only participate in relaying the data packets from the excursion nodes to the sink.Downstream excursion nodes may also assist in routing packets from upstream excursion nodes. Sensor nodes access the channel through a modified CSMA type of MAC protocol.Since we have a dis-tributed system, there may be many transmissions going on in the system simultaneously which are not synchronized with each other.Thus, the traffic load will be transported to the sink by a number of asynchronous servers working in parallel.

Routing
In our performance model, we require a rough description of how data flows in a WSN.We assume without loss of generality that the WSN has been deployed on a semi-circle with radius with the sink located at the origin.It will be assumed that the transmission range of a node is Tx , which will also be referred as the one-hop distance.We will assume that a packet advances on a straight line path towards the sink by the amount of node transmission range, , following each successful transmission.We now determine the minimum distance between two nodes that experience simultaneous successful packet transmissions.Each node with a packet to transmit contends for the channel with its neighbors within its transmission range.The successful transmission of a data packet in a contention neighbourhood implies that there were no hidden nodes that transmitted during this time slot.We assume that a node's listening range is the same as its transmission radius.Then, each contention neighbourhood can be visualized as two concentric circular regions of radii Tx and respectively, which is depicted in Figure 1.In Figure 1, node A is the node currently attempting transmission to the next hop node B. The nodes outside of the inner circle of radius Tx centred at node A and within a distance Tx of node B are hidden from node A. We note that not all nodes in the outer circle will corrupt the packet node A sends to node B (ie. nodes in the outer circle that are further than Tx from node B).However, we assume that all nodes in the outer circle are hidden.If node A's transmission is to be successful, then no hidden nodes transmit during node A's transmission to node B and there are no other collisions with A's data packet.From this explanation, we conclude that two nodes having successful packet transmissions should be separated by at least 3 Tx from each other.We may therefore assume that each contention neighbourhood will have a radius of As a result, the maximum number of simultaneous transmissions in the network will be given by, .Due to channel contention, the number of simultaneous transmissions in the network will vary in time.

Contention Model and Packet Service Time
Next, we describe the access of the sensor nodes to the channel.Recall that nodes with packets to forward access the channel using a slotted CSMA type of protocol.We note that each successful transmission will be preceded by a contention interval.In a contention neighbourhood, let us assume that each node with a packet contends independently for the channel during a time slot with probability , which will be called the "attempt" probability.Further, it will be assumed that p 1 p n, where is the number of nodes contending for the channel within a node's transmission range.This value of p maximizes the probability of successful transmission in a slot, n s P , which quickly approaches to a constant value of 1  .The mean packet service time in seconds is composed of the sum of this contention interval and the packet transmission time.We have, where p T is defined as the average time to transmit a packet which is determined by the data rate of the channel and the average packet length.As is customary, we assume that packet lengths are exponentially distributed.Further, we will assume that packet service times, , are exponentially distributed with the mean service time given by (23).

Packet Arrival Process
According to (19), the average of number of excursion nodes in a WSN observing a correlated stationary Gaussian phenomenon is given by .In this performance model, we will make the simplifying assumption that the packet generation is uniformly distributed over the network area and that the level b is not necessarily extreme.This assumption will allow us to compare the performance of the network when non-extreme and extreme data is transmitted.Thus, even though the spatial distribution of data in a WSN when the level b is extreme occurs in clusters of isolated Poisson distributed excursion regions, we will assume that the location of each excursion node is random on the deployment area.We then define the packet generation rate per unit area as,   Since the data traffic flows to the sink, the traffic load per unit area will increase as the distance to the sink decreases.It will be assumed that the traffic density will be same at the equidistance points from the sink.Let us define as the traffic density (i.e. total packet rate per unit area) at a distance r from the sink.The traffic that will pass through the boundary of the semicircle with radius r will be the total data load generated by all the excursion nodes located outside of this perimeter.Thus, we have, It will be assumed that the traffic density throughout a contention neighbourhood will have the same value as its center.Then the total packet arrival rate at a contention neigborhood whose center is at a distance r from the sink will be given by, mean service time given by ( 23) and mean arrival rate of (26)

Packet Delay
We will model each contention neighbourhood as a distributed server.Each server on the routing path from an excursion node to the sink can be modelled as a "virtual" M/M/1 queue.Let us consider a contention neighbourhood located at ., 1 j T x r jr j J    We will model each server as a M/M/1 queue with (26) with j r r  .Thus, the mean packet delay of a packet in a tion neighbourhood at j hops from the sink will be given by the M/M/1 formula, conten 1 where The ighbourhoods that a packet passes through on its routing path correspond to a series of M/M/1 queues in tandem.The queue for the last hop which interfaces with the sink will have the highest traffic load.Clearly, if this queue is stable, then the queues of the preceding hops will also be stable.Thus for the stability of the system, we require that . The average total delay of a packet i-which has orig na (28) Next, we will determine the average total delay of pa ted j hops away from the sink will be given by, , for 1, , i i a cket in the network.Let us define probability that a node will be located j hops away from the sink, 2 Pr(a node is hops from sink) Since all the excursion nodes generate packets at the same rate, then, the average total packet delay in the network is given by, .

Numerical and Simulation Results
this section, we plot curves of analytical expressions .

Average Data Load Results for a Random
Figure 2, we plot the average number of excursion

5
In derived in the paper and verify their accuracy through simulation results.In the following, it is assumed that the underlying phenomenon is described by a spatially isotropic Gaussian random field.All simulation results have been obtained by averaging over 200 simulation runs.In each run, the nodes have been randomly placed over the network area according to uniform distribution.

Field with a Constant Mean
In nodes in the network normalized to the total node population using (19) and the normalized average number of contour nodes for different half-widths   0.025, 0.05, 0.1   using (21).Here we have assumed a covariance structure (4) with 1 2 50, 1 rational quadratic     .Since the nodes are homogeneous, these ond to the results corresp normalized data load experienced by the network due to excursion/contour nodes attempting to forward their packets to the sink.It may be seen that for an extreme value level at least twice the standard deviation of data, b > 2, the excursion node population is well below 5 percent of the total node po-pulation.These results also illustrate that greater amount of data reduction is possible when only the contour node population is allowed to transmit data to the sink.
In Figure 3, we demonstrate the effect of the correlation range parameter, 1  , when the phenomenon has rational quadratic spatial covariance (4) on the average area of an isolated excu ion region for a standardized Gaussian random field for levels b = {2, 2.5, 3}, and fixed 2 1 rs   , using (13).It may be seen that the average size of an excursion region increases with the level of correla n the other hand, for a given level of correlation, the average size decreases with increasing level b.
In Figure 4 we plot the average number of isolated excursion regions that occur on the network deployment tion.O area (17) as a function of 1  for standardized Gaussian data with rational quadratic spatial covariance (4) for levels b = {2, 2.5, 3} assum ng that the nodes have been deployed on a circular region with radius 400 meters.It is seen that the average number of excursion regions decreases as the level of correlation increases.Next, we present simulation results that verify the accuracy of expressions for the average numb i er of excursi on and contour nodes in the network, using ( 19) and (21) respectively.We consider a WSN with N = 2000 sensor nodes deployed on a circular area of radius R = 400 meters.In Tables 1 and 2, we assume that time is fixed and all network data has been standardized so that       (note that the entrie e covariance matrix are computed through (4)).Table 1 presents both the analytical and simulation results for the average number of excursion nodes in the network for different levels b.As may be seen, the analytical and simulation results are in close agreement with each other over a wide range of levels b.The last column in the table shows the percentage data load based on simulation results which illustrates the reduction in the nominal data load that can be s of th

Nex
(22) for m -stationary random fields is accurate under enarios where the phenomenon occurs as an impulse at hich has been deployed w achieved by only transmitting data generated by the excursion nodes for different levels b.It may be seen that the load drops below one percent for values of b > 2.5.The Table 2

Fiel a Sp d T ing Mea
t, we present simulation results which show that ean non sc an epicentre and induces an isotropic Gaussian random field in the surrounding region.
In the general scenario, we assume that at time t = 0 an impulse of magnitude c occurs at the epicenter and consider a WSN with N nodes w ithin some proximity of the epicentre.For t > 0, we assume that a Gaussian random process occurs at each node and that these processes are Markovian in time and correlated according to the non-separable, Markovian model given in (6).To model the effect of the impulse on the node data, we assume that at t = 0 each node senses a random value with a Gaussian distribution where the data values are correlated according to (6) We wil 2 b  . The node data at each sampling time is generated based on the observed node data at the g instant according to the conditional Gaussian tion given by (10).In our simulation, we generate samples of the node data every 5 seconds until the phenomenon reaches steady-state (i.e. when the mean value is 0 at all points across the deployment).Analytical and simulation results are presented in Table 3. Table 3 shows that the average number of excursion nodes in the network is accurately predicted by (22) when the phenomenon exhibits spatial non-stationarity in the mean sense and also a Markovian temporal evolution.We note that when computing (22), the variance of each node's data at a sample time,

Mean Delay Results
this section, we present plots of the average packet tance to the sink using ( 27) nd (28) in a hypothetical large-scale WSN of 5000 In delay as a function of hop-dis a nodes deployed on a semi-circle of radius 500 meters.The transmission range (equivalently, the hop-distance) is set to 25 Tx r  meters.We have assumed the underlying phenomenon is a standarized isotropic Gaussian random field with rational quadratic covariance with parameter In Figure 6, we show the mean total delay of packets originating at each hop distance from the sink in the network usi the mean delay achieved by increasing the level associated with the excursion nodes.Note how there is only a small reduction in the delay when the level is increased from b = 2.0 to b = 2.5 whereas there is a significant delay readuction when the level is increased from b = 1.0 to b = 2.0.There is clearly, a practical limit to the minimal achievable total average delay.Obviously, we can-

Conclusions
In this pape have presented an r we analytical model for e data values observed by the nodes in a WSN.We nsor network data as samples from a aussian random field.We have presented results for the average total area of the network that experiences data above an arbitrary level b, and the average area of an isolated excursion region containing extreme data when the level b is chosen sufficiently high.It has been shown that the number of excursion regions on the network deployment approaches a Poisson distribution for extreme levels, b.The average size of the isolated excursion regions increases while their frequency decreases when the level of spatial correlation is increased.It has also been observed that the sum of the average size of excursion regions and the average excursion node population for a level b are a function of b alone and not on the degree of spatial correlation.We have quantified the data reduction that may be achieved by only transmitting data from excursion and contour nodes for various levels b.Finally, a performance model has been presented for large-scale sensor networks that can be used to derive the mean delay experienced by packets at each hop on their journey to the sink.The results of the paper will be useful in the design of large sensor networks in ensuring that the network meets the requirements of the end user.
distribution function (CDF) and probability density function (pdf) of a standard normal random variable respectively.The parameters

Figure 1 .
Figure 1.Illustration of a contention neighbourhood.Nodes outside the inner-circle do not transmit during a time slot.The x's represent sensor nodes in the contention neighbourhood.
In both tables, the node data is

Figure 3 .
Figure 3.The average area of an isolated excursion region (m 2 ) as a function of the correlation range parameter, α 1 for levels b = {2.0,2.5, 3.0}, and fixed roughness parameter ,

Figure 4 .
Figure 4. Average number of excursion regions on a circular deployment area with radius 400 metres as a function of the correlation range parameter, for levels b = {2, 2.5, 3}

Figure 2 .
Figure 2. The average data load normalized to the total network data load using the excursion and contour node populations.

Figure 5 ,
usual, we assume that only excurison nodes attempt to transmit their packets to the sink.We assume all excursion node packets have mean length nd that the data rate of the channel is 30 kbps.This gives a mean packet transwe show the mean delay experience by a packet at each hop for excursion distance in the network 2.0, 2.5} u g (27).For a given le ng (28).This figure illustrates the reduction in not just take b arbitrarily high or there will be no trans-node levels b = {1.5, sin vel b, this figure shows the increase in mean delay as the packet approaches the sink.It also shows how reducing the data load by using extreme levels (b = 2.0, and b = 2.5 can be considered extreme) reduces the mean delay of packets, especially as the packets move closer to the sink.

Figure 5 .
Figure 5.The mean delay experienced by a packet at each hop distance from the sink.

Figure 6 .
Figure 6.The total average delay in the WSN for packets originating at different hop distances from the sink.
th have modeled the se G Copyright © 2011 SciRes.WSN tour Maps: Monitoring and Diagnosis in Sensor Networks," Coms: the International Journal of Computer unication Networking (ACM), Vol.50, No.

Table 1 . Analytical and simulation results for the average number of excursion nodes in the network.
presents the results for the average number of contour nodes in the network for different contour levels b with the contour line half-width fixed as 0.05 N = 2000 nodes, R = 400 m, α 1 = 50, α 2 = 1

Table 2 . Analytical and simula r average nu- mbe ontour n in network. tion results fo r of c odes
, but with an initial mean value inversely proportional to the node's distance to the epicentre, 2