Robust Reconstruction of Sensor Swarms Floating through Enclosed Environments

A novel type of application for the exploration of enclosed or otherwise difficult to access environments requires large quantities of miniaturized sensor nodes to perform measurements while they traverse the environment in a “go with the flow” approach. Examples of these are the exploration of underground cavities and the inspection of industrial pipelines or mixing tanks, all of which have in common that the environments are difficult to access and do not allow position determination using e.g. GPS or similar techniques. The sensor nodes need to be scaled down towards the millimetre range in order to physically fit through the narrowest of parts in the environments and should measure distances between each other in order to enable the reconstruction of their positions relative to each other in offline analysis. Reaching those levels of miniaturization and enabling reconstruction functionality requires: 1) novel reconstruction algorithms that can deal with the specific measurement limitations and imperfections of millimetre-sized nodes, and 2) improved understanding of the relation between the highly constraint hardware design space of the sensor nodes and the reconstruction algorithms. To this end, this work provides a novel and highly robust sensor swarm reconstruction algorithm and studies the effect of hardware design trade-offs on its performance. Our findings based on extensive simulations, which push the reconstruction algorithm to its breaking point, provide important guidelines for the future development of millimetre-sized sensor nodes.

Figure 1.A swarm of sensor nodes is inserted into the environment of interest; the sensor nodes traverse the environment using the flow of the medium.Once the sensor nodes are distributed in the environment, distance measurements between neighbouring sensor nodes are performed and stored locally within the nodes.Finally, the sensor nodes and their data are retrieved.All data is analysed offline to obtain the 3-D reconstruction of the sensor swarm from the inter-node distance measurements.
communication radius or better signal-to-noise ratios, will improve reconstruction performance, but make cost-effective miniaturization of nodes more challenging.Therefore, in Section 5, we use extensive simulations with increasingly more limitations on the hardware capabilities to study their effect on the reconstruction performance.These limitations include increased noise in the measurements and not being able to uniquely identify other nodes.The assessment of the performance of the reconstruction depends on which information is favoured from the swarm and different performance metrics are Wireless Sensor Network The simulations in this work do not guarantee that robust reconstruction will work in reality, but as also discussed in Section 6, they do show that advances in offline reconstruction algorithms, can, to a large extent, compensate for severely limited hardware capabilities.Understanding the trade-offs between hardware design and the reconstruction performance can guide the hardware design of yet-to-be-developed highly miniaturized sensor nodes of future applications [2].
Several applications and research studies already exist that use a sensor swarm approaches where sensor nodes move with a flow through specific environments.
In [8], 13 cm-sized sensor nodes were developed and deployed in the ocean to flow with the ocean currents and allows to study e.g.plankton behaviour.These nodes can actively change their density to adjust their depth while collecting a variety of sensor data.The nodes receive ultrasound ping signals from nearby buoys (at kilometre distance) such that afterwards their positions can be reconstructed using general lateration.The researchers in [9] developed millimetre-sized sensor nodes that can be injected into living fish to study how e.g.dams and ocean energy devices affect their behaviour.These sensor nodes measure temperature and emit a unique identifier using ultrasound such that they can be remotely tracked by an array of external hydrophones on the shoreline (within hundreds of metres distance).Both of the mentioned studies are characterized by the fact that external localization hardware (buoys or hydrophones with a known location) are used to reconstruct the node positions.
In our earlier work, targeting enclosed and difficult to access environments, Wireless Sensor Network we developed 4 cm-sized sensor nodes that measure the node's inertial data (i.e.linear acceleration and rotation) and measure the local magnetic field.These nodes with variable density have been used e.g. to explore and inspect an underground pipeline of 15 centimetre in diameter and 260 metre in length [10] and to study the dynamics of multi-phase fluids in a mixing tank of several cubic metres [11].Another of our attempts to work towards the mentioned future applications is described in [1].It describes a field test that uses dummy (non-functional) sensor nodes to explore a deep underground sandstone oil-reservoir.This field test revealed that in order for the sensor nodes to pass through the 300 metre sandstone oil-reservoir, the size of sensor nodes cannot exceed 9 mm as the local dimensions are not larger than that.
The mentioned studies show that a go with the flow sensor approach can yield valuable insights into yet undiscovered environments and dynamics.They also show that some applications have clear constraints on e.g. the allowed maximum size of the used sensor nodes in order to pass through the environment.The objective of this paper is to see how far we can limit the capabilities of sensor nodes and how far they can be scaled down while still getting good reconstruction performance.In the following section (Section 2) we discuss the implications of scaling down sensor nodes to the millimetre size and operating them in enclosed and difficult to access environments.
Even though the used constraints in this paper represent a very challenging application scenario and might seem excessive in light of current applications, we believe they are valid constraints for future miniaturized nodes [2].We think that providing a solution to the swarm reconstruction problem under these adverse sensing conditions can also be beneficial for current applications and problems.To summarize our contributions: 1) a novel and robust reconstruction algorithms is developed that deals with severe sensing conditions and imperfections, even including non-unique identification of distance measurements; 2) sensor node system design principles are derived from extensive simulations by sweeping input parameters until the reconstruction algorithm fails to reconstruct the sensor swarm; 3) different methods on how to assess the performance of the swarm reconstruction are provided, depending on which information is favoured from the swarm.

The Implications of Application Constraints
This section discusses existing work and shows that the applications which we target pose specific challenges that have not been addressed earlier in a holistic manner.Reconstructing the position of nodes in (wireless) sensor and robot networks has been subject of study for an extensive period of time [3] [4].
Obtaining simultaneously the location of nodes as well as obtaining the map of the yet-unknown environment, is often referred to as Simultaneous Localization and Mapping (SLAM) [12].Our research also falls under this denominator but has to deal with several distinctive constraints relative to other work.These constraints will be discussed in this section and can be summarized as: 1) no data is communicated between nodes (other than needed for distance measurements), measurements are stored in memory for offline analysis; 2) sparse connectivity in a large swarm; 3) not depending on external communication to e.g.fixed beacons; 4) using solely distance measurements, i.e. no direction information or additional sensor information, like bearing, odometry or inertia; 5) nodes are not identified uniquely.
Constraints 1, 3, and 4 are hard constraints and depending on the choice of applications, these either exist or not and make the reconstruction problem harder.Constraint 2 and 5 are soft constraints and can vary in severity, depending on e.g.parameter choices that can be tweaked.For example, considering constraint 2: the connectivity between nodes depends on the transmission power, the distance between nodes and signal-to-noise ratio at the receiving side.
Considering constraint 5: the identifiability of nodes depends on the number of identifiers chosen relative to the number of nodes used.Increasing the severity of these soft constraints makes the reconstruction problem gradually harder.
Therefore, in our study, the parameters influencing connectivity and identifiability are swept over a large enough range in order to study the consequences of this on the reconstruction performance and therewith also to find the limits of current reconstruction algorithms.
Although each separate constraint has been considered earlier; this work is, to the best of our knowledge, the first to research reconstruction of sensor swarms when considering all mentioned constraints together and under realistic conditions.

Power Consumption
Probably the most significant and overarching constraint of highly miniaturized sensor nodes is their power limitation.Due to volume and weight constraints in the small sensor nodes, battery capacity will be extremely limited and this influences all other design choices related to sensing, computation, and communication.In fact, we envision levels of miniaturization in which nodes have just enough energy to perform only a single inter-node distance measurement.
Furthermore, communication requires relatively large amounts of energy and is therefore only used to realize the inter-node distance measurements, but for example not for online distributed processing of the measurements (hence constraint 1).Instead of distributing the measured data or sending it to a central sink node, we use an approach in which the measured data is stored locally in the sensor nodes, and afterwards retrieved and processed centrally when power consumption is no longer a limitation.
Communication transmission power and the signal to noise ratio of receiving electronics are limited, reducing the communication range between nodes.
Therefore, the inter-node connectivity between sensor nodes, by means of inter-node distance measurements, will be sparse (hence constraint 2) and dependent on the energy budget.
To the best of our knowledge, there is no other research that aims to perform swarm reconstruction under such extreme low-power conditions.

Beacons and Landmarks
A distinction that can be made in SLAM related research is whether or not beacons are being used, that is, fixed points in space that serve as reference points where individual nodes can relate to.In our applications, we can not rely on the presence of beacons, as opposed to e.g.[13] [14] and the earlier mentioned work in [8] [9].In our applications the environments are generally difficult-to-access and block long-range communication signals (hence constraint 3).
In applications with a static environment, the environment could potentially be used as a continuum of beacons, or in this case better described as landmarks.
Opposed to the research mentioned in e.g.[15] where high-end cameras and/or laser range-finders are used, measuring clear fixed landmarks is infeasible as such detection devices have a high power consumption and need continuous measuring; it is something that doesn't scale well to miniaturized nodes.
Furthermore, detection of the environment using such sensors as e.g. in [15] [16] would require clear line-of-sight in a transparent medium, which cannot be guaranteed in our applications.
Efforts to map node positions and the environments based on acoustic reflections can be found e.g. in [17] [18].This, however, requires constraints on e.g. the node positions or the environment shape (e.g.straight walls), both of which cannot be guaranteed in our applications.
Given the aforementioned, the nodes are forced to cooperate with each other in order to obtain information on their positions, this also called cooperative localization [3].

Distance Measurements
A common method for cooperative localization is to reconstruct node positions based on distance measurements between the nodes.Often such methods are amended with e.g.inertial, bearing, or odometric information from the nodes, like in e.g.[19] [20].In our research however, we explicitly rely only on inter-node distance measurements, as adding such devices or sensors would add significantly to the size and energy budget (hence constraint 4).For example, an inertial sensor would need to be active continuously to track the absolute position and orientation.Furthermore, current miniaturized inertial sensors are not accurate enough for absolute position determination for longer than e.g. one second.Alternatively, the absolute orientation of the node could be estimated using e.g.small and low-budget sensors that measure the Earth magnetic field and the Earth gravitational field.But unfortunately, in our applications, no guarantees can be given on the external fields as the unknown environment can block or distort these.
Although one can argue that, in the future, advances in sensors will lift some of the mentioned limitations, we show that with only inter-node distance measurements, the reconstruction of the sensor swarm is possible.

Communication and Identification
Nodes with (sub-)centimetre dimensions only fit antennas that effectively produce EM radiation at frequencies that have a large absorption in the liquid media.The (liquid) media can have a high salinity and are also not guaranteed to be clear enough for the use of (visible) light communication.This effectively prevents EM communication for distances larger than e.g. a centimetre.
Ultrasound transducers at these scales do provide larger communication ranges within the boundaries of the environments, but yield other challenges for stable and fast communication between dynamic nodes in enclosed environments [21].The information that can be communicated in a unit of time is therefore very limited, typically in the order of kilobytes per second or less.
Furthermore, the relatively slow speed of sound, in combination with the relatively small bandwidth, reflective environments and the fact that nodes are non-static, leads to highly time-varying communication channels [5] [22].
Distance measurements based on round-trip time of flight need to be completed within just a tiny fraction of a second as otherwise the nodes may have moved significantly relative to each other, invalidating the measured distance.
The low data rate and the need to perform distance measurements in a short period of time creates the incentive to reduce the content of the messages used for distance measurements (i.e.reducing the number of bits in the message).For example, in the ranging protocol as studied in [5] [22], at a data rate of 40 kbit/sec, the transmission of every additional bit for the identifiers to indicate the sender and receiver of the message adds 3 percent to the measurement noise caused by the movement of the nodes.This effect becomes larger when considering lower data rates.Furthermore, increasing the message length increases the probability of overlapping messages due to (time-varying) multi-path, thereby making it harder to correctly decode the messages.
In applications where many sensor nodes are used, ideally each node has a unique identifier by which it can unambiguously communicate with other nodes.However, for the envisioned application cases mentioned in this paper, it is worthwhile to explore the possibility of reducing the number of identifying bits as they make up the majority of the ultrasound messages (hence constraint 5).Reducing the number of identifying bits, and thus only having non-unique identification makes communication ambiguous.The inter-node distance measurements then also become ambiguous, but can be resolved using the robust reconstruction algorithm presented in this paper.Non-unique identification [24] [25].Earlier efforts to first resolve these ambiguities before the reconstruction of node positions can be found in [26], but does not include actual reconstruction results.
Our reconstruction algorithm is, to the best our knowledge, the first that can successfully cope with identification ambiguities and with all the other mentioned hardware constraints.

Problem Description
In this section we discuss the specific problem formulation and introduce our notation.All symbols used in this paper are summarized in Table 1.In this paper, whenever we use the bar-indicator, e.g.d , it is to stress the true value of the parameter, to differentiate it from an estimated or measured value, e.g.d .
Table 1.Legend of symbols used throughout the paper.As mentioned in the introduction, the reconstruction of the node positions can be seen as a graph-optimization problem where ( ) , =    , in which the vertices  are the sensor node positions and the edges  are the distances between them.This is also visualized in Figure 2.However, the edges of this graph can only be based on the measured distances that are noisy representations of the actual distances.In order to estimate an accurate graph of the nodes' positions we attempt to reduce the least squares error: Solving these kind of problems is often performed using non-linear optimization methods like e.g. a Levenberg-Marquardt algorithm or a Gauss-Newton algorithm.As with most non-linear optimization problems the key is to provide a proper initial estimate, in this case an initial estimate of the node positions.This initial estimate is then further refined using non-linear optimization.Obtaining the initial estimate for the node positions is performed by our novel robust reconstruction algorithm that is discussed in Section 4.
This section will first discuss the method on how to measure distances between the nodes (i.e. the edges  ) and the difficulties herein.In Section 3.1 we discuss how the distance measurements are performed.Section 3.2 discusses the different types of measurement noise to be expected in these measurements and the issues related to identification are discussed in Section 3.3.The consequences of these on the reconstruction algorithm are discussed in Section 3.4.

Distance Measurement Protocol
Following the operational procedure as depicted in Figure 1, once the nodes are well distributed within the environment of interest, the nodes perform a single distance measurement to neighbouring nodes.This distribution can be achieved by e.g.gradually inserting new nodes into the environment until a steady outflow of nodes is reached.How the actual measurements are performed is defined by the ranging protocol that is used.Many of these protocols already exist and the one that is designed specifically for our applications is described in Omnidirectional emission and reception is considered (as opposed to directional), because there is no a priori or online knowledge of the positions of neighbouring nodes.Whether or not messages are received by neighbouring nodes is dependent on the transmitted power and, among other factors, the received signal-to-noise ratio.
For convenience, but without loss of generality, in this paper we assume that distances can and are measured between all neighbouring nodes that are in line-of-sight and are within a fixed and known communication radius, comm r .
Details of these protocols, as e.g.discussed in [22], are outside the scope of this paper, however, it is important to know that the messages that are transmitted and received between the nodes for measuring the distance between them need to contain identifiers.The identifiers are used to identify the sender of the message, and depending on the protocol also to verify whether the message is actually addressed to the receiving nodes.These identifiers make up the largest part (in bits) of the message.
The result of the ranging protocol is a set of measurements where every node i measures the distance , i j d to every neighbouring node j, and node j having measured the distance , j i d to node i.This is visualized in Figure 3(a).The measurements are individually stored in the nodes' memory.

Distance Uncertainty
The measured distance between nodes will be affected by different imperfections in the system.As the dynamics of the nodes and the environment they are in are unknown, a range of possible imperfections should be accounted for as we do not know the effect on the reconstruction result.To simulate the measured distances, in this paper we consider a single snapshot of the actual distances and introduce all types of measurement noise to account for the existing imperfections.
The types of noise in the distance measurements that we are modelling can be divided in inlier-type noise and outlier-type noise.The inlier-type measurement noise being: • additive Gaussian noise a  ; to account for e.g.offset in ranging timer or variable delay in electronics, and also to account for the errors due to movement of the nodes while performing the ranging measurements, • multiplicative Gaussian noise m  ; to account for e.g.different clock frequencies or inhomogeneous medium, And the outlier-type measurement noise being: • identification noise; to account for erroneous detection of identifiers, • outlier noise; to account for burst-like/spiky/intermittent measurement noise.
The environment also gives rise to additional outlier-types of noise that can influence the reconstruction: • obstruction of signal paths, • reflection of signals, • loss of nodes or inability to retrieve them.
How exactly these are simulated is described later, in Section 5.2.

Identification Uncertainty
Due to the constraints described in Section 2, the identifiers used in the ranging messages are reduced to non-unique identifiers, similar to the research in [5].
Each of the N nodes has a unique (hardware) identification number, UID, and is assigned a non-unique communication identifier, cid, that is used in the ranging message to identify the sender and addressee of the message.
. This is illustrated in Figure 3(b).The mapping from UID to CID is known and well defined, however, the mapping from CID to UID is ambiguous as more nodes share the same cid.

Mutual Connections
When the nodes are retrieved after the experiment and their data read out, offline analysis can be performed.Normally, with unique identification, mutual connections can be established by associating measurements , i j d and , j i d with each other (Figure 3(a)).These mutual connections can be used as consistency check of the individual measurements and quantify the uncertainty of the measured distance.When the two individual measurements differ less than a specific threshold, the measurements can be considered inliers and a mutual agreed-upon distance can be established, otherwise they are considered outliers and cannot be used to establish a mutual connection.
However, in the case of non-unique identification, measurement

Swarm Reconstruction Algorithm
In this section we will describe the novel reconstruction algorithm.It reconstructs node positions based on inter-node distances with significant measurement imperfections and the ambiguities introduced by using non-unique identifiers.
The reconstruction algorithm can be summarized as follows: 1) Hypothesise mutual connections based on the consistency (distance and identity) between all distance measurements, obtained from all retrieved nodes (Section 4.1); 2) Find initial four nodes to fix coordinate reference system (Section 4.2); 3) Robustly add additional nodes to the graph using a guided-RANSAC algorithm and general lateration, exploiting the geometric consistency between true neighbouring nodes (Section 4.3); 4) Perform non-linear optimization to reduce build-up of errors from previous step; go back to step 3 (Section 4.4); 5) Loop closing, if needed (Section 4.5).
An overview of the algorithmic steps of the reconstruction method is provided in Appendix A.

Identity and Consistency Check
Before the reconstruction process, mutual connections should be established between nodes.The decoded non-unique communication identifier, cid, from every received ranging message should be associated with a unique (hardware) identification number, UID.Initially, this can be approached using a combinatoric method in which all possible nodes are considered that use the respective cid.
The range of options can be narrowed down, as the distance measurement For all pairs that are considered consistent with each other, i.e. , where ∅ indicates that no mutual connection is considered, i.e. an empty entry.
Threshold value r  is the maximum allowed difference in distance measurement for which two measurements are considered inliers.This threshold should be related to the noise.It can e.g.be obtained using trial and error if the error model is unknown.Note that due to the measurement noise, the threshold, and the non-unique identification not all pairs that are considered consistent are correct, they are only hypothesised as being correct connections.
The obtained mutual connections between node i and node j and their hypothesised distance is not unambiguous.As illustrated in Figure 5, the hypothesised mutual connections , i j h for measurement ( ) , (c) 1) the measurement between the real nodes i and j ; 2) identity ambiguity: a mutual connection to a node p which is believed to be at similar distance as node j but not necessarily within communication radius of i (see Figure 5); 3) distance ambiguity: a mutual connection with the correct node j but with a distance belonging to the measured distance to another node k within its communication radius (see Figure 5).Wireless Sensor Network Identity ambiguity arises due to similar distance between pairs with similar CID in the swarm (Equation ( 2)).Distance ambiguities arise due to a plurality of nodes with similar cids within communication radius.
The identity ambiguities are based on the statistical likelihood that somewhere in the swarm, a node pair with similar cids and distance is present such that the measurement pair obeys Equation ( 2).Assuming a uniform spatial distribution of the nodes, the average number of identity ambiguities in , i j h scales with

Initial Seed Selection
In this work we solve the reconstruction problem by incrementally growing the graph using a robust RANSAC-based method based on general lateration principles.Given an initial graph  , new nodes can be added as illustrated in Figure 2 and detailed in [6].In an ideal case (without noise), every candidate node, c, of which the distances to four non-coplanar nodes with known positions , , , s s s s are known, can be added to the graph with as position the intersection of the spheres with radii Since the graph growing is performed in 3  and the distance measurements are performed in  and without external beacons acting as fixed reference Wireless Sensor Network points in space, first a coordinate reference system has to be defined.The initial four node positions are chosen such that they define the coordinate system, therewith resolving the general reflection and rotation ambiguity.Their positions, 1 2 3 4 , , , s s s s , define the coordinate system as follows, and also illustrated in Figure 2: , , s + ∈    .These nodes are selected based on their connectivity and the stability of their geometric configuration which is obtained by general lateration techniques.
Since , i j h contains ambiguities, the initial seed selection is conditional until the graph growing has successfully added several nodes to the graph  .If this is not possible, a new initial seed is selected and the process is repeated.

RANSAC Graph Growing
After the initial seed is chosen, additional nodes can be added to the graph when the nodes have at least four connections to already reconstructed nodes.The set of candidate nodes that have at least four connections to already reconstructed nodes is denoted as  .Ideally,  only consists of nodes that are true neighbouring nodes, but due to the ambiguities in the hypothesised connections , i j h this is not the case.Figure 6 illustrates a simplified 2-D situation in which a candidate node c ∈  has connections to its true neighbouring nodes in  , but also false connections to nodes somewhere else in the graph due to identity ambiguities.Whether the connections are true or false is not known at this point.Figure 6.RANSAC graph growing algorithm uses an inlier-outlier voting system to filter out outlier distance measurements and proposes a position for candidate nodes to add to the graph.Using cliques of nodes in the graph that are within twice the communication radius helps in reducing the probability that nodes are positioned based on false connections.
A Random Sampling Consensus (RANSAC) method is used to attempt to correctly position the candidate nodes.RANSAC is a general robust estimation technique, for more details we refer to [27].RANSAC attempts to fit a model on data points that contain both inliers (points that satisfy the model) and outliers (points that do not satisfy the model).It does this by randomly selecting a small set of data points, estimating a model based on this small set, and counting the number of other data points that agree with this model.This process is repeated until a model is found that has maximum or sufficient support in the whole set of data points.We developed a specific RANSAC algorithm to solve the task of graph-growing under severe outlier noise and identification ambiguity.An overview of the RANSAC algorithm is provided in Appendix B.
When a candidate node c is considered, all nodes that are already reconstructed in the graph and that are connected with this c are selected.This set of nodes is denoted with , to propose-using general lateration-a position for c up to a reflection ambiguity (three nodes are always coplanar).
However, no guarantees can be given whether the entries in This step in the RANSAC algorithm is repeated with each time a different set of three proposers p a , until all possibilities are exhausted or until a proposal received a specific threshold in RANSAC-score.In the latter case it is then considered sufficiently supported and is added to the graph.When no "sufficiently supported" condition is reached, the RANSAC procedure is repeated for a next candidate node until all candidates are considered or a sufficiently supported condition is found for a specific candidate.The candidate with the highest RANSAC-score is added to the graph in  with edges (  ) only to the supporting nodes and corresponding , a c h entry.The "sufficiently supported" condition for the RANSAC can be based on the expected percentage of outlier measurements or can be chosen heuristically by trial-and-error, as is done in this work.

Guiding RANSAC
The number of false connections from the candidate nodes to the already reconstructed nodes, grows on average linearly with the number of nodes already in the graph, due to I  , as explained in Section 4.  This sorting has two effects on the reconstruction, it guides the RANSAC procedure and has as a result that: 1) it increases the probability that a "sufficiently supported" candidate is found quickly, and 2) it increases the probabilitys that from all "sufficiently supported" candidates, the best one is chosen for addition to the graph.
Guiding the RANSAC procedure with this sorting requires additional computational power (calculating the cliques), relative to unguided RANSAC.
This increase in required computational power is compensated for by the quicker finding of a "sufficiently supported" candidate.The reconstruction can roughly take between 0.1 -10 seconds per node, depending mainly on the connectivity and ambiguities.

Robust Non-Linear Refinement
The stepwise addition of new nodes to the graph introduces build-up of errors.
These errors in positions can prevent other nodes from being added.In order to reduce this error build-up, a global non-linear optimizer algorithm is executed as described in [6] to minimize the cost function Equation ( 1).It can easily solve graph problems consisting of thousands of nodes and edges.
In our algorithm, after every m newly added nodes, or failure to add a new node this non-linear optimization is performed.There is a clear trade-off between m and the processing time required for the reconstruction algorithm.In our work, m is chosen arbitrarily to 10.In future work, this can e.g.be adjusted dynamically based on uncertainty of previously added nodes.

Loop Closing
The algorithm described above only considers edges between nodes in  when the specific connections supported the winning proposal in the RANSAC voting.
As can later be seen in the results section, Section 5, this is effective against any type of outlier noise.But when considering loops in the environment, these loops might not be closed due to the previously mentioned error build-up, as schematically illustrated in Figure 7.
Nodes that are supposed to connect both ends of the loop, will only be placed at one of both ends, depending on which has the largest number of supporters.Placement on the other end of the loop could also have yielded successful reconstruction, however, the node will not automatically be added to both ends to close the loop.An additional step should be performed in order to detect such loops and optimize the whole graph such that the loop is closed.This section describes the method how these loops are detected and closed.
Conceptually, the detection of loops is performed by searching for a set of nodes that could have been reconstructed in two different, distant, positions in the graph.This starts by looking at all hypothesised connections between nodes that are not used in the reconstruction: i.e. all connections , i j h that did not support a winning RANSAC proposal, we call these excluded connections.The connections that did get included are called included connections.
Let node i have excluded connections to nodes ex i k and included connections to reconstructed neighbouring nodes in i j , as illustrated in Figure 7.In order to successfully detect and close a loop, sufficient connectivity should exist between nodes in i j and ex i k .
After the reconstruction has stopped or halted, for every node i it is checked how many connections are excluded from the reconstruction, and how many of the neighbouring nodes in i j also have excluded connections to nodes ex i k .
When several neighbouring nodes agree that nodes are excluded, the node is said to be commonly excluded.Nodes ex i k that are commonly excluded by nodes in i j are likely to be on an other end of a loop than nodes in i j .
Final determination of the loop closing can be performed by redoing the RANSAC graph-growing on this set of connections by forcing the nodes to be added to the graph on the other side of the loop.When this yields a successful reconstruction, and the local relative positions of the considered nodes is equivalent to the local relative positions on the original side of the loop; then the loop can be closed.The loop closing itself can be performed by including the specific loop-closing edges that were initially excluded to the original graph and running the non-linear graph optimizer on the newly obtained graph.

Numerical Simulations
In this section we will use numerical simulations to study the performance of the reconstruction algorithm, as well as how the different soft constraints

Environment Model
The envisioned node swarms can be deployed in a variety of environments.In this paper we will work with two types of environments as schematically illustrated in Figure 8.One resembles in abstract terms a mixing tank as the environmental dimensions are approximately the same in all directions.For convenience, the tank-like environment is chosen to be a bounded spherical environment.The second environment is a long pipeline, where the diameter of the pipeline is still large enough such that nodes can be positioned all around one another (as opposed to in one line along the pipe axis).The smooth pipeline has a fixed diameter of 8 cm and 400 nodes are placed in a section of 4 m pipe-length.The pipeline environment will be considered in both a loop and a loop-less fashion.
In the spherical environment, 400 nodes are given a random position within

Distance Measurement Generation
Distance measurements are generated based on the generated node positions.
The distances between all nodes i and j as measured by nodes i are indicated by Furthermore, a variety of operations are executed on the measurements to account for different outlier noise types, each with their own statistical likelihood, ν , for this type of noise to happen to the measurement.
( Identification noise results in an erroneously decoded identifier, i.e. node j is measured as being node k.Outlier noise  is chosen to be randomly drawn from a uniform distribution in the range ( ] comm 0, r .
Additionally, signal multipath can result in additional measurements of specific node pairs: besides the distance based on the direct signal path, also distances based on reflected, longer, signal paths.In order to account for this we add an additional distance measurement , i j d′ to the measurement dataset: with  a perturbation drawn from a uniform distribution in the range

Effective Connectivity
The discussed noise not only affects the accuracy of the distance measurements but also determines whether correct mutual connections can be estimated (Section 3.4).Due to noise, the effective number of neighbouring nodes to which correct distance measurements are made is therefore lower than the number of neighbouring nodes comm n that are within communication radius.
The effective connectivity, EC, between nodes is defined by the number of neighbouring nodes to which inlier measurements (and thus correct mutual connections) are available for the reconstruction algorithm.An estimation of the effective number of neighbouring nodes can be made based on the theoretical likelihood that inlier measurements can be established in the presence of the mentioned outlier noise types: with ν indicating the probability that the specific outlier-type of noise are present in the individual measurements as defined in Section 5.2.The terms for outlier and identification noise are squared as these are effects happening to ,

Reconstruction Performance Analysis
In our applications, or more in general for swarm operations like these, it is not established what is the best method to assess the performance of the reconstruction algorithm, i.e. to assess the quality of the reconstructed swarm.It is highly dependent on what information is favoured from such a swarm.Conventional metrics might not be suitable as the goal of the applications might be different.When exploring a yet unknown and difficult-to-access environment, initially one might want to know the overall structure of the environment: e.g. a rough estimation of the local geometry and the overall shape of the total environment.
Later, e.g. when also adding additional sensor information, one might be more interested in fine-grained local geometry where the absolute error over the entire swarm is less relevant.We therefore present different performance metrics that serve different goals.

Absolute Error
The absolute error is the mean squared absolute error of all reconstructed node positions relative to their ground truth positions.It is calculated as , the sum over all N nodes where ŝ is the reconstructed position of the node and s its ground truth position.In order to compare the ground truth positions with the reconstructed positions, a linear fit between the 3-D positions of the initial four seed nodes and their ground truth positions is performed to resolve the general rotation and reflection ambiguity.The mean squared absolute error is a commonly used performance metric in reconstruction studies.A disadvantage of this metric is that when a reconstructed swarm exhibits error build-up, the absolute error between one side of the swarm and the other can be large, while the local errors can actually be small.In swarm reconstruction studies like this, this metric should therefore not be used as the only metric to assess reconstruction performance.

Relative Error
The relative error is the mean squared relative error of the reconstructed distances between nodes, relative to the ground truth distances.It is calculated as , the sum over all M reconstructed distances, where d is the reconstructed distance and d the ground truth distance.
Unlike abs  , the metric rel  is not affected by build-up of errors.It assesses the reconstruction performance only on a local and relative scale.

Global Error
The global error glob  is defined similar as abs  , but includes the performing of a rigid transform (transformation and rotation) of the entire reconstructed swarm such that abs  is minimized.
This metric illustrates the reconstruction performance of the swarm better Wireless Sensor Network than abs  as it is not based on a specific chosen seed from which everything is built on.However, it is still sensitive to error build-up.

Local Error
The local error loc  is defined as mean squared absolute error, but only after performing a rigid transform of all subsections of 20 connected nodes with their ground truth positions.
This heuristically chosen performance metric considers only the local errors, but other than rel  , it does focus on the (absolute) reconstructed positions of the nodes rather than the (relative) distances between them.

Recall
The recall,  , is the percentage of nodes reconstructed by the reconstruction algorithm.As this does not take into account whether or not these nodes are reconstructed correctly, we introduce the adjusted recall  and  .These are the percentages of nodes that are reconstructed within a specified error condition that is heuristically chosen.For  , this condition is when at least 80% of the node's reconstructed distances (or edges in  ) have a relative error <10%; or when more than 50% of the nodes reconstructed distances have a relative error of <1%.The adjusted recall  is similar to  , but includes the condition that the neighbouring nodes to which the relative error suffices this condition, should also fall in the category of  .Note that this categorization of recall into  ,  and  is based on a heuristic, subjective interpretation of the reconstruction result.
Examples of reconstructions are shown in Figure 10  In Figure 10(h) and Figure 10(g) the reconstructions also seems correct on the local scale (e.g.seen in the relative low loc  ) but exhibits some local errors that cause the reconstructed graph to deviate from the original axis of the pipeline.
These differences illustrate the importance of having different error metrics as the assessment should happen based on which information is favoured from the swarm.As an extreme, even in Figure 10(i) the diameter of the pipe can still be estimated from the reconstruction while the shape of the swarm is very inaccurate.2.
Deciding whether a reconstruction gives satisfactory results or not is therefore not and depends on the application goals.The adjusted recall parameters are an attempt to quantify these goals.In this paper, on the basis of subjectively interpreting how well the shape of the swarms in Figure 10 are reconstructed, recall parameter  can be seen as a reasonable metric to assess the performance of the reconstructions.We can define a satisfactory reconstruction to be one with e.g.80% >  .
Table 2. Performance metrics of reconstructions shown in Figure 10.

Parameter Sweeps and Breaking Points
In this section we study the performance of the reconstruction algorithm while sweeping the parameters related to the soft constraints.These parameters are the ones involved with measurement noise as described in Section 3.For each set of parameters, 50 different experiments are performed, each with randomly generated node positions (but following the environment geometry constraints as defined in Section 5.1).All experiments are performed with a total number of 400 N = nodes.For similar work with a smaller and larger number of nodes, the reader is referred to our earlier work [6] and [7].
The Gaussian noise components in

Figure 2 .
Figure 2. Reconstructing nodes within a swarm can be seen as a graph problem, ( ) , =    , with the node positions as vertices  (circles) and the distances between nodes the edges  (lines).The initial four node positions { } 1 2 3 4 , , , s s s s are chosen to define the coordinate system to solve the global reflection and rotation ambiguity.Using general lateration techniques, additional nodes, like the candidate node in the figure, can be added when distances to at least four already reconstructed nodes are known.For clarity of the figure, not all (required) edges are drawn.

ds
Graph with reconstructed nodes indicating positions as vertex  and distance between nodes as edges  , i j Unique (hardware) identification number (UID) of nodes i and j, other letters are also used to indicate other nodes.An additional bar in the indication stresses these are the true values (opposed to measured or hypothesised) ( ) CID i Communication identification number (CID) used by node i , i j Distance between node i and node j.Additional bars in the indication stress that these are true values (opposed to measured of hypothesised).i Position of node i  Consistency function used for associating measured distances from two nodes into a mutual connection r Threshold value for maximum allowed distance deviation in consistency condition  Individual nodes (small letter) in a set of nodes (capital) selected based on some criteria in reconstruction algorithm Recall percentages of nodes that have been reconstructed; Y and Z have an additional condition on the accuracy of the reconstruction abs glob loc , ,    Error metrics of a reconstruction indicating the mean absolute squared error of the reconstructed positions rel  Error metric of a reconstruction indicating the relative error of the reconstructed distances ν Different noise parameters of distance measurements, expressed in percentages.Index indicates type of noise.E.H. A. Duisterwinkel et al.DOI: 10.4236/wsn.2018.101001 10 Wireless Sensor Network

Figure 3 .
Figure 3. Mutual connections from distance measurements (indicated with arrows) with unique and non-unique identification.Distance measurements , i j d as measured by node i and distance measurement , j i d as measured by node j using unique identification can unambiguously be associated with each other.Distance measurements

(
Figure 3(b)) can only be used correctly when the measured non-unique identifier ( ) CID j is associated with node j.Obtaining the mutual connection where are associated with the correct nodes j and i, respectively (similar to Figure 3(a)), is not trivial as the mapping from CID to UID is ambiguous.

Figure 4 .
Figure 4.An example of (a) fully connected graph with five nodes; (b) outlier noise for two measurements; (c) obstruction between two nodes; (d) loss of one nodes; and (e) reduction of communication radius.Identification noise can be seen as outlier noise but on identification instead of distance.Explanation: Nodes are indicated as dark circles; a line indicates a distance measurement between two nodes.A too long/short line indicates a measurement with outlier noise.A dashed line indicates the distance measurement has never existed (obstruction) or this data is not retrieved (loss).

Figure 4 (
Figure 4(a) shows a fully connected reference graph in which all mutual connections are present and correct.Outlier noise in the distance measurements, as shown in Figure 4(b), leads to a set of connection is hypothesised and the average distance is taken as mutually agreed upon distance between them.All other pairs are not considered a mutual connection.The final consistency function  is expressed as:

Figure 5
Figure 5. Nodes with and the proposed positions do not need to be viable or close to the true position.Therefore, each of the other connected nodes vote for the proposed position.Consensus is reached when the majority of the voting nodes, including the three proposing nodes, have at least a 50% majority.The reflection ambiguity is resolved by choosing which of the two positions received more supporting votes.When consensus is reached, the number of supporting votes is called the RANSAC-score for the specific proposal.
indicate the groups of nodes that are potentially within the communication radius of a proposed candidate node position.This means that on geometrical reasons, all of these nodes can potentially agree on candidate node position.In other words, nodes from outside this clique could never agree with all nodes within the clique on a candidate node position.The example in Figure6shows these cliques in coloured oval areas.The number of nodes in these cliques are registered and are used as sorting order in which RANSAC is performed.Nodes with larger sized cliques in ( ) b  are more likely to have more connections with true neighbouring nodes in the graph.

E
. H. A. Duisterwinkel et al.DOI: 10.4236/wsn.2018.10100120 Wireless Sensor Network For this we use the efficient general graph optimization software package g 2 o [28].It uses sparse methods to solve the normal equations at the core of the non-linear optimization techniques like Levenberg-Marquardt or a Gauss-Newton.

Figure 7 .
Figure 7.A 2-D example of an environment with a loop (ground truth).Due to error build-up in the RANSAC graph growing, loops are not closed (reconstruction withouth loop closing).Loop closing candidates can be detected by comparing local geometric consistency of excluded connections with reconstructed positions.After detection, the loop can be closed (reconstruction with loop closing).

(
connectivity and identification ambiguity) influence this performance.The simulations are performed using two vastly different environment geometries, nodes are given a position in a pipeline environment and in a spherical environment (Section 5.1).The distance measurements are generated based on the distances between these positions and a variety of noise types is added to account for different measurement imperfections (Section 5.2).These different noise types can influence the distance measurements such that the number of neighbouring nodes to which inlier measurements are available is reduced (Section 5.3).Assessing the performance of the reconstruction algorithm can be performed in different ways, depending on which qualities are favoured of the reconstruction.Therefore, several performance metrics are introduced to address these different qualities (Section 5.4).All parameters related to the soft constraints are swept over a broad range until the reconstruction algorithm will fail to reconstruct the node positions (Section 5.5).Knowing the failure point of our reconstruction algorithm and understanding the trade-offs between the underlying constraints helps in guiding the hardware design for the development of future miniaturized nodes.The findings are summed up by plotting the performance against the calculated effective number of neighbouring nodes to see the most important trade-off (Section 5.6).In the last part, examples are given to show the loop closing detection and actual closing in environments that require this (Section 5.7).
the boundary based on a uniform probability density function.The positions in the pipeline environment are chosen such that the nodes are spread out uniformly over the axis of the pipeline and the off-axis positions are chosen randomly based on a uniform probability density function over the cross-section.A total of 50 different spherical and 50 different pipeline environments are generated for the simulations.Examples can be seen in Figure 9.

Figure 8 .
Figure 8.A 2-D interpretation of the 3-D spherical environment and 3-D pipeline environment.The lines between the nodes indicate connectivity for two different nodes.

Figure 9 .
Figure 9.The ground truth positions of 400 nodes in several different experiments of randomly generated environments that are used for the simulations.Five pipeline environments, and one tank-like (spherical) environment.
types of noise do not occur.Identification and outlier noise occur randomly to each measurement, independently of each other.Obstruction and loss results in non-existing measurements, indicated with ∅ .Obstruction happens to both , . A. Duisterwinkel et al.DOI: 10.4236/wsn.2018.10100124 Wireless Sensor Network simultaneously; and loss happens to individual nodes, independently of each other.
other but both influence the ability to establish a correct bidirectional measurement.The reconstruction algorithm is developed to be able to robustly filter out different noise types by relying on the geometric consistency between true neighbouring nodes and their measured distances to each other.The more inlier measurements that are available, the more robust the algorithm is.The effective connectivity EC is a predictor of the of inlier measurements and E. H. A. Duisterwinkel et al.DOI: 10.4236/wsn.2018.101001 25 Wireless Sensor Network therefore can be used as predictor of reconstruction performance.The usage of EC as such a predictor is verified in Section 5.6.
and the corresponding performance metrics are listed in Table 2.The distribution of the relative errors of the reconstructed distances is indicated with a boxplot on the right of each reconstruction.Only Figure 10(d) and Figure 10(a) show reconstructions with like a perfect reconstruction.All others seem to exhibit their own erroneous characteristics, but non of them are completely wrong.Figure 10(e) has a fairly high abs  but the same order of magnitude as abs  in Figure 10(d).The reconstruction only seems to suffer slight error build-up that leads to large absolute errors at the pipe end but for the rest has an accurate reconstruction.For the majority of the swarm in Figure 10(f), the positions are accurately reconstructed.Only at the ends of the pipe the reconstruction algorithm seems to have created large errors.

Figure 10 .
Figure 10.Reconstructions of node swarms in a spherical environment (a)-(c) and in a pipeline environment (d)-(i) with different parameter sets.Blue circles are the ground truth positions of the nodes and red crosses the reconstructed positions.The recall metrics , ,    and the error metrics abs rel glob loc , , ,     are assess the performance of the reconstruction are listed in Table2.

Figure 11 (
Figure 11(a) shows for a spherical environment the recall ( , ,    ) and the different error metrics abs rel glob loc , , ,    

Figure 11 (
Figure 11(b)  shows the performance metrics using similar input parameters, but in a pipeline environment.It can be seen that the recall metrics in this case are much higher than in the spherical environment.This is due to a different 2The average communication radius when comm 20 n = .
, since it is unlikely that another set of nodes with similar CID pair have the same matching distance.However, the distance measurements are imperfect and noisy.Consequently, the distance measurements and the "reverse" distance measurement will not be equal.Therefore, potential mutual connections are hypothesised when the measured distances lie within a specific threshold value r  of each other and the measured identifiers are consistent with each others cid.This condition, d should give most of the times an unambiguous mutual E. H. A. Duisterwinkel et al.DOI: 10.4236/wsn.2018.10100115 Wireless Sensor Network correct connection nodes already in the graph, this candidate node might receive a large RANSAC-score but based on only false connections.The true neighbouring nodes have in that case not enough voting power over the number of false 1. Consequently, when a candidate node is considered that does not have enough true neighbouring nodes in the graph yet but does have a large number of false E. H. A. Duisterwinkel et al.DOI: 10.4236/wsn.2018.10100119 Wireless Sensor Network connections to r .The probability of this reflection happening is defined as reflection 2, the number CID n of non-unique identifiers cid, and the number of neighbouring nodes within communication radius comm n .In order to change the number of neighbouring nodes we actually change the sensing radius comm r such that on average, nodes have comm n neighbouring nodes.