^{1}

^{*}

^{1}

^{1}

^{2}

^{1}

^{1}

A novel type of application for the exploration of enclosed or otherwise difficult to access environments requires large quantities of miniaturized sensor nodes to perform measurements while they traverse the environment in a “go with the flow” approach. Examples of these are the exploration of underground cavities and the inspection of industrial pipelines or mixing tanks, all of which have in common that the environments are difficult to access and do not allow position determination using e.g. GPS or similar techniques. The sensor nodes need to be scaled down towards the millimetre range in order to physically fit through the narrowest of parts in the environments and should measure distances between each other in order to enable the reconstruction of their positions relative to each other in offline analysis. Reaching those levels of miniaturization and enabling reconstruction functionality requires: 1) novel reconstruction algorithms that can deal with the specific measurement limitations and imperfections of millimetre-sized nodes, and 2) improved understanding of the relation between the highly constraint hardware design space of the sensor nodes and the reconstruction algorithms. To this end, this work provides a novel and highly robust sensor swarm reconstruction algorithm and studies the effect of hardware design trade-offs on its performance. Our findings based on extensive simulations, which push the reconstruction algorithm to its breaking point, provide important guidelines for the future development of millimetre-sized sensor nodes.

Automatic and detailed reconstruction of environments and objects from sensory data is one of the success stories of modern signal processing. Still there are environments for which detailed reconstruction is not possible with current remote or in situ sensing technologies. In this work, we focus on such environments and examples are: the interior of deep underground formations like (oil-) reservoirs, mines and geothermal sources, and industrial infrastructure like pipelines, mixing tanks and reactors. These environments are filled with a (semi-) liquid medium and typically are globally large but locally small (e.g. in diameter). Furthermore, they are difficult-to-access for in situ measurements and unsuitable for remote sensing methods. In this paper we consider a go with the flow approach that follows a sensor swarm paradigm, in which large quantities (e.g. hundreds or thousands) of highly miniaturized and redundant sensor nodes are gradually inserted into the environment [

The individual sensor node positions relative to each other can be reconstructed from sparse inter-node distance measurements [

robustly obtain such an initial estimate, our offline^{1} reconstruction algorithm uses a Random Sampling Consensus (RANSAC) method applied to general lateration techniques [

We use this reconstruction algorithm to study trade-offs in the hardware design space, as many hardware challenges still need to be addressed before large swarms of millimetre-sized functional sensor nodes can be developed and deployed effectively. Trade-offs must be made in the hardware design with respect to sensing capabilities and size. Improved capabilities, e.g. larger

communication radius or better signal-to-noise ratios, will improve reconstruction performance, but make cost-effective miniaturization of nodes more challenging. Therefore, in Section 5, we use extensive simulations with increasingly more limitations on the hardware capabilities to study their effect on the reconstruction performance. These limitations include increased noise in the measurements and not being able to uniquely identify other nodes. The assessment of the performance of the reconstruction depends on which information is favoured from the swarm and different performance metrics are

introduced.

The simulations in this work do not guarantee that robust reconstruction will work in reality, but as also discussed in Section 6, they do show that advances in offline reconstruction algorithms, can, to a large extent, compensate for severely limited hardware capabilities. Understanding the trade-offs between hardware design and the reconstruction performance can guide the hardware design of yet-to-be-developed highly miniaturized sensor nodes of future applications [

Several applications and research studies already exist that use a sensor swarm approaches where sensor nodes move with a flow through specific environments. In [

In our earlier work, targeting enclosed and difficult to access environments, we developed 4 cm-sized sensor nodes that measure the node’s inertial data (i.e. linear acceleration and rotation) and measure the local magnetic field. These nodes with variable density have been used e.g. to explore and inspect an underground pipeline of 15 centimetre in diameter and 260 metre in length [

The mentioned studies show that a go with the flow sensor approach can yield valuable insights into yet undiscovered environments and dynamics. They also show that some applications have clear constraints on e.g. the allowed maximum size of the used sensor nodes in order to pass through the environment. The objective of this paper is to see how far we can limit the capabilities of sensor nodes and how far they can be scaled down while still getting good reconstruction performance. In the following section (Section 2) we discuss the implications of scaling down sensor nodes to the millimetre size and operating them in enclosed and difficult to access environments.

Even though the used constraints in this paper represent a very challenging application scenario and might seem excessive in light of current applications, we believe they are valid constraints for future miniaturized nodes [

This section discusses existing work and shows that the applications which we target pose specific challenges that have not been addressed earlier in a holistic manner. Reconstructing the position of nodes in (wireless) sensor and robot networks has been subject of study for an extensive period of time [

1) no data is communicated between nodes (other than needed for distance measurements), measurements are stored in memory for offline analysis;

2) sparse connectivity in a large swarm;

3) not depending on external communication to e.g. fixed beacons;

4) using solely distance measurements, i.e. no direction information or additional sensor information, like bearing, odometry or inertia;

5) nodes are not identified uniquely.

Constraints 1, 3, and 4 are hard constraints and depending on the choice of applications, these either exist or not and make the reconstruction problem harder. Constraint 2 and 5 are soft constraints and can vary in severity, depending on e.g. parameter choices that can be tweaked. For example, considering constraint 2: the connectivity between nodes depends on the transmission power, the distance between nodes and signal-to-noise ratio at the receiving side. Considering constraint 5: the identifiability of nodes depends on the number of identifiers chosen relative to the number of nodes used. Increasing the severity of these soft constraints makes the reconstruction problem gradually harder. Therefore, in our study, the parameters influencing connectivity and identifiability are swept over a large enough range in order to study the consequences of this on the reconstruction performance and therewith also to find the limits of current reconstruction algorithms.

Although each separate constraint has been considered earlier; this work is, to the best of our knowledge, the first to research reconstruction of sensor swarms when considering all mentioned constraints together and under realistic conditions.

Probably the most significant and overarching constraint of highly miniaturized sensor nodes is their power limitation. Due to volume and weight constraints in the small sensor nodes, battery capacity will be extremely limited and this influences all other design choices related to sensing, computation, and communication. In fact, we envision levels of miniaturization in which nodes have just enough energy to perform only a single inter-node distance measurement.

Furthermore, communication requires relatively large amounts of energy and is therefore only used to realize the inter-node distance measurements, but for example not for online distributed processing of the measurements (hence constraint 1). Instead of distributing the measured data or sending it to a central sink node, we use an approach in which the measured data is stored locally in the sensor nodes, and afterwards retrieved and processed centrally when power consumption is no longer a limitation.

Communication transmission power and the signal to noise ratio of receiving electronics are limited, reducing the communication range between nodes. Therefore, the inter-node connectivity between sensor nodes, by means of inter-node distance measurements, will be sparse (hence constraint 2) and dependent on the energy budget.

To the best of our knowledge, there is no other research that aims to perform swarm reconstruction under such extreme low-power conditions.

A distinction that can be made in SLAM related research is whether or not beacons are being used, that is, fixed points in space that serve as reference points where individual nodes can relate to. In our applications, we can not rely on the presence of beacons, as opposed to e.g. [

In applications with a static environment, the environment could potentially be used as a continuum of beacons, or in this case better described as landmarks. Opposed to the research mentioned in e.g. [

Efforts to map node positions and the environments based on acoustic reflections can be found e.g. in [

Given the aforementioned, the nodes are forced to cooperate with each other in order to obtain information on their positions, this also called cooperative localization [

A common method for cooperative localization is to reconstruct node positions based on distance measurements between the nodes. Often such methods are amended with e.g. inertial, bearing, or odometric information from the nodes, like in e.g. [

Although one can argue that, in the future, advances in sensors will lift some of the mentioned limitations, we show that with only inter-node distance measurements, the reconstruction of the sensor swarm is possible.

Nodes with (sub-)centimetre dimensions only fit antennas that effectively produce EM radiation at frequencies that have a large absorption in the liquid media. The (liquid) media can have a high salinity and are also not guaranteed to be clear enough for the use of (visible) light communication. This effectively prevents EM communication for distances larger than e.g. a centimetre.

Ultrasound transducers at these scales do provide larger communication ranges within the boundaries of the environments, but yield other challenges for stable and fast communication between dynamic nodes in enclosed environments [

Furthermore, the relatively slow speed of sound, in combination with the relatively small bandwidth, reflective environments and the fact that nodes are non-static, leads to highly time-varying communication channels [

The low data rate and the need to perform distance measurements in a short period of time creates the incentive to reduce the content of the messages used for distance measurements (i.e. reducing the number of bits in the message). For example, in the ranging protocol as studied in [

In applications where many sensor nodes are used, ideally each node has a unique identifier by which it can unambiguously communicate with other nodes. However, for the envisioned application cases mentioned in this paper, it is worthwhile to explore the possibility of reducing the number of identifying bits as they make up the majority of the ultrasound messages (hence constraint 5). Reducing the number of identifying bits, and thus only having non-unique identification makes communication ambiguous. The inter-node distance measurements then also become ambiguous, but can be resolved using the robust reconstruction algorithm presented in this paper. Non-unique identification of distance measurements makes most existing reconstruction efforts not applicable as they depend on well-defined, non-ambiguous identifications [

Our reconstruction algorithm is, to the best our knowledge, the first that can successfully cope with identification ambiguities and with all the other mentioned hardware constraints.

In this section we discuss the specific problem formulation and introduce our notation. All symbols used in this paper are summarized in

Symbol | Description |
---|---|

N | Number of nodes in swarm |

n CID | Number of communication identifiers in use |

n comm | Number of nodes within communication radius |

r comm | Communication radius of nodes |

G ( V , E ) | Graph with reconstructed nodes indicating positions as vertex V and distance between nodes as edges E |

i , j | Unique (hardware) identification number (UID) of nodes i and j, other letters are also used to indicate other nodes. An additional bar in the indication stresses these are the true values (opposed to measured or hypothesised) |

CID ( i ) | Communication identification number (CID) used by node i |

d i , j | Distance between node i and node j. Additional bars in the indication stress that these are true values (opposed to measured of hypothesised). |

s i | Position of node i |

C | Consistency function used for associating measured distances from two nodes into a mutual connection |

ϵ r | Threshold value for maximum allowed distance deviation in consistency condition C |

h i , j | Set of hypothesised distances between node i and j |

A I | Number of additional identity ambiguities |

A D | Number of additional distance ambiguities |

a ∈ A , b ∈ B , c ∈ C | Individual nodes (small letter) in a set of nodes (capital) selected based on some criteria in reconstruction algorithm |

Q , Q ( b ) | Cliques in already reconstructed graph |

X , Y , Z | Recall percentages of nodes that have been reconstructed; Y and Z have an additional condition on the accuracy of the reconstruction |

E abs , E glob , E loc | Error metrics of a reconstruction indicating the mean absolute squared error of the reconstructed positions |

E rel | Error metric of a reconstruction indicating the relative error of the reconstructed distances |

ν | Different noise parameters of distance measurements, expressed in percentages. Index indicates type of noise. |

As mentioned in the introduction, the reconstruction of the node positions can be seen as a graph-optimization problem where G = ( V , E ) , in which the vertices V are the sensor node positions and the edges E are the distances between them. This is also visualized in

arg min V ∈ ℝ 3 ∑ E ( ‖ V i − V j ‖ − E i , j ) 2 (1)

Solving these kind of problems is often performed using non-linear optimization methods like e.g. a Levenberg-Marquardt algorithm or a Gauss-Newton algorithm. As with most non-linear optimization problems the key is to provide a proper initial estimate, in this case an initial estimate of the node positions. This initial estimate is then further refined using non-linear optimization. Obtaining the initial estimate for the node positions is performed by our novel robust reconstruction algorithm that is discussed in Section 4.

This section will first discuss the method on how to measure distances between the nodes (i.e. the edges E ) and the difficulties herein. In Section 3.1 we discuss how the distance measurements are performed. Section 3.2 discusses the different types of measurement noise to be expected in these measurements and the issues related to identification are discussed in Section 3.3. The consequences of these on the reconstruction algorithm are discussed in Section 3.4.

Following the operational procedure as depicted in

Omnidirectional emission and reception is considered (as opposed to directional), because there is no a priori or online knowledge of the positions of neighbouring nodes. Whether or not messages are received by neighbouring nodes is dependent on the transmitted power and, among other factors, the received signal-to-noise ratio.

For convenience, but without loss of generality, in this paper we assume that distances can and are measured between all neighbouring nodes that are in line-of-sight and are within a fixed and known communication radius, r comm .

Details of these protocols, as e.g. discussed in [

The result of the ranging protocol is a set of measurements where every node i measures the distance d i , j to every neighbouring node j, and node j having measured the distance d j , i to node i. This is visualized in

The measured distance between nodes will be affected by different imperfections in the system. As the dynamics of the nodes and the environment they are in are unknown, a range of possible imperfections should be accounted for as we do not know the effect on the reconstruction result. To simulate the measured distances, in this paper we consider a single snapshot of the actual distances and introduce all types of measurement noise to account for the existing imperfections. The types of noise in the distance measurements that we are modelling can be divided in inlier-type noise and outlier-type noise. The inlier-type measurement noise being:

・ additive Gaussian noise N a ; to account for e.g. offset in ranging timer or variable delay in electronics, and also to account for the errors due to movement of the nodes while performing the ranging measurements,

・ multiplicative Gaussian noise N m ; to account for e.g. different clock

frequencies or inhomogeneous medium,

And the outlier-type measurement noise being:

・ identification noise; to account for erroneous detection of identifiers,

・ outlier noise; to account for burst-like/spiky/intermittent measurement noise.

The environment also gives rise to additional outlier-types of noise that can influence the reconstruction:

・ obstruction of signal paths,

・ reflection of signals,

・ loss of nodes or inability to retrieve them.

How exactly these are simulated is described later, in Section 5.2.

Due to the constraints described in Section 2, the identifiers used in the ranging messages are reduced to non-unique identifiers, similar to the research in [

Now, instead of every node i measuring distance d i , j to node j and vice versa, node i is measuring distance d i , CID ( j ) to a node that has a communication identifier CID ( j ) . And vice versa, node j measures distance d j , CID ( i ) . This is illustrated in

When the nodes are retrieved after the experiment and their data read out, offline analysis can be performed. Normally, with unique identification, mutual connections can be established by associating measurements d i , j and d j , i with each other (

However, in the case of non-unique identification, measurement d i , CID ( j ) (

Mutual connections, where d i , CID ( j ) is associated with d j , CID ( i ) , are obtained by applying a consistency check on each of the possible node pairs that have cids consistent with the measurements. A mutual connection is hypothesised when two distance measurements agree on two grounds: 1) the measured cids agree with the CID of the other, and; 2) the corresponding measured distance is approximately similar to the other (i.e. inlier measurement). The exact implementation is described in Section 4.1.

This consistency check is affected by the different types of measurement noise in the individual measurements, specifically the outlier-type of noise that prevent finding the correct mutual connections.

different types of outlier-type noise affect finding correct mutual connections.

In this section we will describe the novel reconstruction algorithm. It reconstructs node positions based on inter-node distances with significant measurement imperfections and the ambiguities introduced by using non-unique identifiers.

The reconstruction algorithm can be summarized as follows:

1) Hypothesise mutual connections based on the consistency (distance and identity) between all distance measurements, obtained from all retrieved nodes (Section 4.1);

2) Find initial four nodes to fix coordinate reference system (Section 4.2);

3) Robustly add additional nodes to the graph using a guided-RANSAC algorithm and general lateration, exploiting the geometric consistency between true neighbouring nodes (Section 4.3);

4) Perform non-linear optimization to reduce build-up of errors from previous step; go back to step 3 (Section 4.4);

5) Loop closing, if needed (Section 4.5).

An overview of the algorithmic steps of the reconstruction method is provided in Appendix A.

Before the reconstruction process, mutual connections should be established between nodes. The decoded non-unique communication identifier, cid, from every received ranging message should be associated with a unique (hardware) identification number, UID. Initially, this can be approached using a combinatoric method in which all possible nodes are considered that use the respective cid. The range of options can be narrowed down, as the distance measurement d i , CID ( j ) of node i to node j, when performed correctly and without noise, should be similar to the reverse distance measurement d j , CID ( i ) of node j to node i. This means that searching the measured data for the consistency where d i , CID ( j ) is similar to d j , CID ( i ) should give most of the times an unambiguous mutual correct connection, since it is unlikely that another set of nodes with similar CID pair have the same matching distance.

However, the distance measurements are imperfect and noisy. Consequently, the distance measurements and the “reverse” distance measurement will not be equal. Therefore, potential mutual connections are hypothesised when the measured distances lie within a specific threshold value ϵ r of each other and the measured identifiers are consistent with each others cid. This condition, denoted with d a , b ≃ d c , d , then becomes:

d a , b ≃ d c , d when { | d a , b − d c , d | ≤ ϵ r CID ( a ) = CID ( d ) CID ( b ) = CID ( c ) (2)

For all pairs that are considered consistent with each other, i.e. d a , b ≃ d c , d , a mutual connection is hypothesised and the average distance is taken as mutually agreed upon distance between them. All other pairs are not considered a mutual connection. The final consistency function C is expressed as:

C ( d a , b , d c , d ) = ( d a , b + d c , d 2 when d a , b ≃ d c , d ∅ otherwise (3)

where ∅ indicates that no mutual connection is considered, i.e. an empty entry.

Threshold value ϵ r is the maximum allowed difference in distance measurement for which two measurements are considered inliers. This threshold should be related to the noise. It can e.g. be obtained using trial and error if the error model is unknown. Note that due to the measurement noise, the threshold, and the non-unique identification not all pairs that are considered consistent are correct, they are only hypothesised as being correct connections.

The obtained mutual connections between node i ¯ and node j and their hypothesised distance is not unambiguous. As illustrated in

h i , j = { C ( d i ¯ , j ¯ , d j ¯ , i ¯ ) ( a ) C ( d i ¯ , j , d p , q ) ( b ) C ( d i ¯ , k , d j , m ) ( c ) (4)

1) the measurement between the real nodes i ¯ and j ¯ ;

2) identity ambiguity: a mutual connection to a node p which is believed to be at similar distance as node j ¯ but not necessarily within communication radius of i ¯ (see

3) distance ambiguity: a mutual connection with the correct node j ¯ but with a distance belonging to the measured distance to another node k within its communication radius (see

The identity ambiguities are based on the statistical likelihood that somewhere in the swarm, a node pair with similar cids and distance is present such that the measurement pair obeys Equation (2). Assuming a uniform spatial distribution of the nodes, the average number of identity ambiguities in h i , j scales with A I ∝ ϵ r n comm N / n CID 2 . Equally, the distance ambiguities are based on the statistical likelihood that neighbouring nodes give rise to confusion. The average number of distance ambiguities per d i ¯ , CID ( j ) then scales with A D ∝ ϵ r n comm 2 N / n CID 2 .

Both type of ambiguities, A I and A D , increase with an increasing number of neighbouring nodes n comm and a lower n CID , leading to a more challenging task deciding which are correct mutual connections and which are false connections. These ambiguities are resolved in our RANSAC graph-growing algorithm as detailed in Section 4.3.

In this work we solve the reconstruction problem by incrementally growing the graph using a robust RANSAC-based method based on general lateration principles. Given an initial graph G , new nodes can be added as illustrated in

Since the graph growing is performed in ℝ 3 and the distance measurements are performed in ℝ and without external beacons acting as fixed reference points in space, first a coordinate reference system has to be defined. The initial four node positions are chosen such that they define the coordinate system, therewith resolving the general reflection and rotation ambiguity. Their positions, s 1 , s 2 , s 3 , s 4 , define the coordinate system as follows, and also illustrated in

Since h i , j contains ambiguities, the initial seed selection is conditional until the graph growing has successfully added several nodes to the graph G . If this is not possible, a new initial seed is selected and the process is repeated.

After the initial seed is chosen, additional nodes can be added to the graph when the nodes have at least four connections to already reconstructed nodes. The set of candidate nodes that have at least four connections to already reconstructed nodes is denoted as C . Ideally, C only consists of nodes that are true neighbouring nodes, but due to the ambiguities in the hypothesised connections h i , j this is not the case.

A Random Sampling Consensus (RANSAC) method is used to attempt to correctly position the candidate nodes. RANSAC is a general robust estimation technique, for more details we refer to [

When a candidate node c is considered, all nodes that are already reconstructed in the graph and that are connected with this c are selected. This set of nodes is denoted with A = { a : a ∈ V , h a , c ≠ ∅ } . A subset of three of these nodes is selected, A p = { a p ∈ A } , to propose―using general lateration―a position for c up to a reflection ambiguity (three nodes are always coplanar). However, no guarantees can be given whether the entries in h a p , c are correct connections and the proposed positions do not need to be viable or close to the true position. Therefore, each of the other connected nodes A v = { a v : ( A v ∪ A p ) = A , ( a v ∩ a p ) = ∅ } are used to vote for the proposed position. Consensus is reached when the majority of the voting nodes, including the three proposing nodes, have at least a 50% majority. The reflection ambiguity is resolved by choosing which of the two positions received more supporting votes. When consensus is reached, the number of supporting votes is called the RANSAC-score for the specific proposal.

This step in the RANSAC algorithm is repeated with each time a different set of three proposers a p , until all possibilities are exhausted or until a proposal received a specific threshold in RANSAC-score. In the latter case it is then considered sufficiently supported and is added to the graph. When no “sufficiently supported” condition is reached, the RANSAC procedure is repeated for a next candidate node until all candidates are considered or a sufficiently supported condition is found for a specific candidate. The candidate with the highest RANSAC-score is added to the graph in V with edges ( E ) only to the supporting nodes and corresponding h a , c entry.

The “sufficiently supported” condition for the RANSAC can be based on the expected percentage of outlier measurements or can be chosen heuristically by trial-and-error, as is done in this work.

Guiding RANSACThe number of false connections from the candidate nodes to the already reconstructed nodes, grows on average linearly with the number of nodes already in the graph, due to A I , as explained in Section 4.1. Consequently, when a candidate node is considered that does not have enough true neighbouring nodes in the graph yet but does have a large number of false connections to nodes already in the graph, this candidate node might receive a large RANSAC-score but based on only false connections. The true neighbouring nodes have in that case not enough voting power over the number of false neighbouring nodes.

To reduce the probability of this to happen, the sampling procedure of the candidate nodes is guided using a sorting order of candidate nodes before the RANSAC algorithm is performed on them. The sorting is based on the likelihood that candidate nodes have enough true neighbouring nodes already in the graph. As such, candidate nodes that are more prone to be reconstructed based on false connections, are considered later. This gives the probability that more of its true neighbouring nodes will be added to the graph first. The algorithm is provided in Appendix C and described next.

For each candidate node { b : b ∈ C } , a list is made of connected nodes that have already been reconstructed in the graph { a : a ∈ V , h a , b ≠ ∅ } . A new graph G = ( V ( a ) , E ) is made with nodes a as vertices. These vertices are connected with edges E ′ when their positions are less than twice the communication radius, 2 r comm , apart. In this new graph, maximal cliques are listed as Q ( b ) , and indicate the groups of nodes that are potentially within the communication radius of a proposed candidate node position. This means that on geometrical reasons, all of these nodes can potentially agree on candidate node position. In other words, nodes from outside this clique could never agree with all nodes within the clique on a candidate node position. The example in

This sorting has two effects on the reconstruction, it guides the RANSAC procedure and has as a result that: 1) it increases the probability that a “sufficiently supported” candidate is found quickly, and 2) it increases the probabilitys that from all “sufficiently supported” candidates, the best one is chosen for addition to the graph.

Guiding the RANSAC procedure with this sorting requires additional computational power (calculating the cliques), relative to unguided RANSAC. This increase in required computational power is compensated for by the quicker finding of a “sufficiently supported” candidate. The reconstruction can roughly take between 0.1 - 10 seconds per node, depending mainly on the connectivity and ambiguities.

The stepwise addition of new nodes to the graph introduces build-up of errors. These errors in positions can prevent other nodes from being added. In order to reduce this error build-up, a global non-linear optimizer algorithm is executed as described in [

For this we use the efficient general graph optimization software package g^{2}o [

In our algorithm, after every m newly added nodes, or failure to add a new node this non-linear optimization is performed. There is a clear trade-off between m and the processing time required for the reconstruction algorithm. In our work, m is chosen arbitrarily to 10. In future work, this can e.g. be adjusted dynamically based on uncertainty of previously added nodes.

The algorithm described above only considers edges between nodes in G when the specific connections supported the winning proposal in the RANSAC voting. As can later be seen in the results section, Section 5, this is effective against any type of outlier noise. But when considering loops in the environment, these loops might not be closed due to the previously mentioned error build-up, as schematically illustrated in

Nodes that are supposed to connect both ends of the loop, will only be placed at one of both ends, depending on which has the largest number of supporters.

Placement on the other end of the loop could also have yielded successful reconstruction, however, the node will not automatically be added to both ends to close the loop. An additional step should be performed in order to detect such loops and optimize the whole graph such that the loop is closed. This section describes the method how these loops are detected and closed.

Conceptually, the detection of loops is performed by searching for a set of nodes that could have been reconstructed in two different, distant, positions in the graph. This starts by looking at all hypothesised connections between nodes that are not used in the reconstruction: i.e. all connections h i , j that did not support a winning RANSAC proposal, we call these excluded connections. The connections that did get included are called included connections.

Let node i have excluded connections to nodes k ex i and included connections to reconstructed neighbouring nodes j in i , as illustrated in

After the reconstruction has stopped or halted, for every node i it is checked how many connections are excluded from the reconstruction, and how many of the neighbouring nodes j in i also have excluded connections to nodes k ex i . When several neighbouring nodes agree that nodes are excluded, the node is said to be commonly excluded. Nodes k ex i that are commonly excluded by nodes j in i are likely to be on an other end of a loop than nodes j in i .

Final determination of the loop closing can be performed by redoing the RANSAC graph-growing on this set of connections by forcing the nodes to be added to the graph on the other side of the loop. When this yields a successful reconstruction, and the local relative positions of the considered nodes is equivalent to the local relative positions on the original side of the loop; then the loop can be closed. The loop closing itself can be performed by including the specific loop-closing edges that were initially excluded to the original graph and running the non-linear graph optimizer on the newly obtained graph.

In this section we will use numerical simulations to study the performance of the reconstruction algorithm, as well as how the different soft constraints (connectivity and identification ambiguity) influence this performance. The simulations are performed using two vastly different environment geometries, nodes are given a position in a pipeline environment and in a spherical environment (Section 5.1). The distance measurements are generated based on the distances between these positions and a variety of noise types is added to account for different measurement imperfections (Section 5.2). These different noise types can influence the distance measurements such that the number of neighbouring nodes to which inlier measurements are available is reduced (Section 5.3). Assessing the performance of the reconstruction algorithm can be performed in different ways, depending on which qualities are favoured of the reconstruction. Therefore, several performance metrics are introduced to address these different qualities (Section 5.4). All parameters related to the soft constraints are swept over a broad range until the reconstruction algorithm will fail to reconstruct the node positions (Section 5.5). Knowing the failure point of our reconstruction algorithm and understanding the trade-offs between the underlying constraints helps in guiding the hardware design for the development of future miniaturized nodes. The findings are summed up by plotting the performance against the calculated effective number of neighbouring nodes to see the most important trade-off (Section 5.6). In the last part, examples are given to show the loop closing detection and actual closing in environments that require this (Section 5.7).

The envisioned node swarms can be deployed in a variety of environments. In this paper we will work with two types of environments as schematically illustrated in

In the spherical environment, 400 nodes are given a random position within the boundary based on a uniform probability density function. The positions in the pipeline environment are chosen such that the nodes are spread out uniformly over the axis of the pipeline and the off-axis positions are chosen randomly based on a uniform probability density function over the cross-section. A total of 50 different spherical and 50 different pipeline environments are generated for the simulations. Examples can be seen in

In both environments, the communication radius r comm of the nodes is chosen such that each node has on average n comm neighbouring nodes. This parameter n comm is one of the parameters that will be swept. The nodes communication radius r comm , in which ranging measurements are possible, is for all nodes within the swarm the same and ranges throughout the experiments from 10 cm ≤ r comm ≤ 15 cm (corresponding to 16 ≤ n comm ≤ 28 ).

Distance measurements are generated based on the generated node positions. The distances between all nodes i and j as measured by nodes i are indicated by d i ¯ , j and are modelled by the true distances between the nodes d ¯ i ¯ , j ¯ (the bar indicating true values).

Additive and multiplicative Gaussian are added to d ¯ i ¯ , j ¯ according to:

d i ¯ , j ¯ = d ¯ i ¯ , j ¯ + d ¯ i ¯ , j ¯ N m + N a (5)

where N m and N a being perturbations from the zero-mean Gaussian distributions N ( 0, σ m 2 ) and N ( 0, σ a 2 ) respectively.

Furthermore, a variety of operations are executed on the measurements to account for different outlier noise types, each with their own statistical likelihood, ν , for this type of noise to happen to the measurement.

d i ¯ , j = F identification ( F outlier ( F obstruction ( F loss ( d i ¯ , j ¯ ) ) ) ) (6)

with

F identification ( d a , b ) = d a , k when identification noise ( probability ν identification )

F outlier ( d a , b ) = U when outlier noise ( probability ν outlier )

F obstruction ( d a , b ) = ∅ when obstruction ( probability ν obstruction )

F loss ( d a , b ) = ∅ when node a is lost ( probability ν loss )

and all operations return F ( d a , b ) = d a , b when the specific types of noise do not occur. Identification and outlier noise occur randomly to each measurement, independently of each other. Obstruction and loss results in non-existing measurements, indicated with ∅ . Obstruction happens to both d i ¯ , j and d j ¯ , i simultaneously; and loss happens to individual nodes, independently of each other.

Identification noise results in an erroneously decoded identifier, i.e. node j is measured as being node k. Outlier noise U is chosen to be randomly drawn from a uniform distribution in the range ( 0, r comm ] .

Additionally, signal multipath can result in additional measurements of specific node pairs: besides the distance based on the direct signal path, also distances based on reflected, longer, signal paths. In order to account for this we add an additional distance measurement d ′ i ¯ , j to the measurement dataset:

d ′ i ¯ , j = d ¯ i ¯ , j ¯ + W (7)

with W a perturbation drawn from a uniform distribution in the range ( 0, r comm ) . The probability of this reflection happening is defined as ν reflection . For this reflection to be measured by node i, this distance should fall within the communication radius r comm . A similar entry d ′ j ¯ , i is added to account for the inverse measurement. Any reflection d ′ i ¯ , j will also be subject to other discussed noise types.

This elaborate process generates realistic distance measurements and is significantly more extensive than that of previous research [

The discussed noise not only affects the accuracy of the distance measurements but also determines whether correct mutual connections can be estimated (Section 3.4). Due to noise, the effective number of neighbouring nodes to which correct distance measurements are made is therefore lower than the number of neighbouring nodes n comm that are within communication radius.

The effective connectivity, EC, between nodes is defined by the number of neighbouring nodes to which inlier measurements (and thus correct mutual connections) are available for the reconstruction algorithm. An estimation of the effective number of neighbouring nodes can be made based on the theoretical likelihood that inlier measurements can be established in the presence of the mentioned outlier noise types:

EC = n comm ( 1 − ν outlier ) 2 ( 1 − ν id ) 2 ( 1 − ν loss ) ( 1 − ν obstruction ) (8)

with ν indicating the probability that the specific outlier-type of noise are present in the individual measurements as defined in Section 5.2. The terms for outlier and identification noise are squared as these are effects happening to d i , j and d j , i independently from each other but both influence the ability to establish a correct bidirectional measurement.

The reconstruction algorithm is developed to be able to robustly filter out different noise types by relying on the geometric consistency between true neighbouring nodes and their measured distances to each other. The more inlier measurements that are available, the more robust the algorithm is. The effective connectivity EC is a predictor of the number of inlier measurements and therefore can be used as predictor of reconstruction performance. The usage of EC as such a predictor is verified in Section 5.6.

In our applications, or more in general for swarm operations like these, it is not established what is the best method to assess the performance of the reconstruction algorithm, i.e. to assess the quality of the reconstructed swarm. It is highly dependent on what information is favoured from such a swarm. Conventional metrics might not be suitable as the goal of the applications might be different. When exploring a yet unknown and difficult-to-access environment, initially one might want to know the overall structure of the environment: e.g. a rough estimation of the local geometry and the overall shape of the total environment. Later, e.g. when also adding additional sensor information, one might be more interested in fine-grained local geometry where the absolute error over the entire swarm is less relevant. We therefore present different performance metrics that serve different goals.

The absolute error is the mean squared absolute error of all reconstructed node positions relative to their ground truth positions. It is calculated as E abs = ∑ i N ‖ s ^ i − s ¯ i ‖ 2 / N , the sum over all N nodes where s ^ is the reconstructed position of the node and s ¯ its ground truth position. In order to compare the ground truth positions with the reconstructed positions, a linear fit between the 3-D positions of the initial four seed nodes and their ground truth positions is performed to resolve the general rotation and reflection ambiguity.

The mean squared absolute error is a commonly used performance metric in reconstruction studies. A disadvantage of this metric is that when a reconstructed swarm exhibits error build-up, the absolute error between one side of the swarm and the other can be large, while the local errors can actually be small. In swarm reconstruction studies like this, this metric should therefore not be used as the only metric to assess reconstruction performance.

The relative error is the mean squared relative error of the reconstructed distances between nodes, relative to the ground truth distances. It is calculated as E rel = ∑ i , j M ‖ d ^ i , j − d ¯ i , j ‖ 2 / M , the sum over all M reconstructed distances, where d ^ is the reconstructed distance and d ¯ the ground truth distance.

Unlike E abs , the metric E rel is not affected by build-up of errors. It assesses the reconstruction performance only on a local and relative scale.

The global error E glob is defined similar as E abs , but includes the performing of a rigid transform (transformation and rotation) of the entire reconstructed swarm such that E abs is minimized.

This metric illustrates the reconstruction performance of the swarm better than E abs as it is not based on a specific chosen seed from which everything is built on. However, it is still sensitive to error build-up.

The local error E loc is defined as mean squared absolute error, but only after performing a rigid transform of all subsections of 20 connected nodes with their ground truth positions.

This heuristically chosen performance metric considers only the local errors, but other than E rel , it does focus on the (absolute) reconstructed positions of the nodes rather than the (relative) distances between them.

The recall, X , is the percentage of nodes reconstructed by the reconstruction algorithm. As this does not take into account whether or not these nodes are reconstructed correctly, we introduce the adjusted recall Y and ℤ . These are the percentages of nodes that are reconstructed within a specified error condition that is heuristically chosen. For Y , this condition is when at least 80% of the node’s reconstructed distances (or edges in G ) have a relative error <10%; or when more than 50% of the nodes reconstructed distances have a relative error of <1%. The adjusted recall ℤ is similar to Y , but includes the condition that the neighbouring nodes to which the relative error suffices this condition, should also fall in the category of Y . Note that this categorization of recall into X , Y and ℤ is based on a heuristic, subjective interpretation of the reconstruction result.

Examples of reconstructions are shown in

Deciding whether a reconstruction gives satisfactory results or not is therefore not trivial and depends on the application goals. The adjusted recall parameters are an attempt to quantify these goals. In this paper, on the basis of subjectively interpreting how well the shape of the swarms in

X ( % ) | Y ( % ) | ℤ ( % ) | E abs ( m 2 ) | E rel | E glob ( m 2 ) | E loc ( m 2 ) | |
---|---|---|---|---|---|---|---|

97 | 97 | 97 | 5 × 10 − 5 | 3 × 10 − 5 | 3 × 10 − 5 | 5 × 10 − 5 | |

95 | 93 | 93 | 1 × 10 − 2 | 2 × 10 − 2 | 1 × 10 − 2 | 2 × 10 − 2 | |

88 | 64 | 0 | 9 × 10 − 1 | 6 × 10 − 2 | 5 × 10 − 1 | 7 × 10 − 2 | |

100 | 100 | 100 | 1 × 10 − 4 | 2 × 10 − 5 | 5 × 10 − 5 | 4 × 10 − 7 | |

99 | 99 | 99 | 2 × 10 − 2 | 1 × 10 − 4 | 5 × 10 − 4 | 4 × 10 − 7 | |

99 | 94 | 94 | 6 × 10 − 3 | 5 × 10 − 3 | 4 × 10 − 3 | 2 × 10 − 4 | |

99 | 59 | 0 | 2 × 10 − 2 | 1 × 10 − 2 | 2 × 10 − 2 | 6 × 10 − 4 | |

100 | 20 | 0 | 3 × 10 − 1 | 5 × 10 − 2 | 2 × 10 − 1 | 1 × 10 − 3 | |

99 | 0 | 0 | 1 × 10 − 0 | 3 × 10 − 1 | 2 × 10 − 1 | 5 × 10 − 3 |

In this section we study the performance of the reconstruction algorithm while sweeping the parameters related to the soft constraints. These parameters are the ones involved with measurement noise as described in Section 3.2, the number n CID of non-unique identifiers cid, and the number of neighbouring nodes within communication radius n comm . In order to change the number of neighbouring nodes we actually change the sensing radius r comm such that on average, nodes have n comm neighbouring nodes.

For each set of parameters, 50 different experiments are performed, each with randomly generated node positions (but following the environment geometry constraints as defined in Section 5.1). All experiments are performed with a total number of N = 400 nodes. For similar work with a smaller and larger number of nodes, the reader is referred to our earlier work [

The Gaussian noise components in d i ¯ , j ¯ are together summarized as ν gaussian , as standard deviation of ( d i ¯ , j ¯ − d ¯ i ¯ , j ¯ ) . This value is expressed as percentage of a fixed communication radius^{2}:

ν gaussian = ∑ i , j M ( d i ¯ , j ¯ − d ¯ i ¯ , j ¯ ) 2 M − 1 1 r comm [ n comm = 20 ] (9)

distribution of connectivity among the nodes, as explained next in Section 5.6.

Furthermore, the error metrics in the pipeline environment show significantly more differences among them than in the spherical case. The local error is orders of magnitude lower than for example the absolute and global error. The environment stretches across larger distances and has smaller local dimensions compared to the spherical case; errors build up easier and cause a larger absolute error, even in cases where the local geometry is reconstructed correctly. This can clearly be seen in e.g.

The observed dip in recall in the spherical environment in

The lines in

The general trend of these lines clearly show there is a dependency of the input parameters on the reconstruction performance. The reconstruction exhibits graceful degradation when the soft constraints increase in severity up to the point where the recall drops and error increases.

The recall result Y of our experiments is plotted against the meta-parameter EC in

Even though the average number of neighbouring nodes are chosen to be similar in both the spherical and the pipeline by setting r conn for all nodes, the variability in the per-node connectivity in the spherical environment is much larger than in the pipeline environment.

It should be noted that the results in

Many environments will contain one or multiple loops; therefore it is important that the reconstruction algorithm can deal with closing these loops. Our loop closing method as described in Section 4.5 is evaluated in this section.

Examples of reconstructed swarms that require loop closing, and the iterative loop closing in the non-linear optimization process are shown in

excluded connections of which the reconstructed distance is larger than 2 r comm are depicted with red squares and clearly indicate that the two ends of the reconstructed swarm are supposed to be connected. The nodes that are reconstructed at the loop-ends do not receive support from nodes on the other side of the loop in the RANSAC voting algorithm. To close the loop, these detected connections that are originally excluded from the graph are now included in the graph. Performing the non-linear optimization using these new connections, now closes the loop as seen in the consecutive steps in

As seen in

These experiments shows that our method can perform loop closing. Multiple loops can be dealt with simultaneously or individually, as seen in

The results show that the reconstruction algorithm can robustly deal with a large set of different types of measurement noise. Each measurement imperfection has a different influence on the established mutual connections and on the reconstruction. The inlier-type of measurement noise are present in all measurements and only affect the accuracy of the reconstructed positions. The outlier-type of noise prevents establishing correct mutual connections and on top of that also create false and ambiguous connections due to the non-unique identification. Using the geometric consistency of the positions of true neighbouring nodes, the reconstruction algorithm can robustly filter out the false connections and prevent them from negatively influencing the reconstruction. Reducing the quantity of available cids significantly increases the ambiguities in establishing mutual connections. However, it is found that due to the robust nature of the reconstruction algorithm, these ambiguities could be resolved when enough inlier measurements are available.

The robustness of our novel algorithm allows us to see that the key parameter for successful reconstruction is the number of inlier node connections that are available for reconstruction. Initially, the total number of distance measurements is determined by the communication range and the density of the nodes in the swarm. It is due to the outlier-type of measurement noises that not all node connections will be inlier measurements and hence can not be used for reconstruction. The effective connectivity can be predicted by our Equation (8) and is an important predictor for the reconstruction performance. We can therefore conclude that in the sensor node hardware design for these applications, resources should be focussed on increasing the communication radius to increase connectivity and to prevent outlier-types of noise from reducing the effective connectivity. Achieving this can for example go at the cost of (unique) identification.

Depending on which information is favoured from the swarm, the error metrics can be used to study the performance of the reconstruction. There is no single error metric that can summarize the reconstruction performance for all application goals. Using different error metrics and the heuristically chosen adjusted recall metrics Y and ℤ we showed that the subjective interpretation of the quality of the reconstruction can be partially quantified. When the sensor nodes need to be designed for specific applications, a performance condition can be set up, e.g. Y > 80 % , to assess under which input conditions the reconstruction is most likely to achieve the requested performance.

This work evaluated the feasibility of using sensor swarms, consisting of many highly miniaturized and severely resource limited sensor nodes, to reconstruct difficult-to-access environments. For this, a novel Guided Random Sample Consensus algorithm together with non-linear graph optimization is proposed. Its main contribution is that the algorithm can handle severe and different types of measurement errors as well as the use of ambiguous non-unique communication identifiers.

Extensive and realistic simulations show that sensor node connectivity is to be favoured over unique identification of nodes. When the number of used communication identifiers is only 2.5% of the total number of nodes ( n CID = 10 , N = 400 ), the entire sensor swarm can still be reconstructed successfully. Thereby allowing less identification information to be used, i.e. shorter communication bursts, to trade-off for larger emission power per burst. This, in turn, allows for improved sensor node connectivity, as more nodes can be reached with a single burst. On the contrary, simulations show that when swarm connectivity drops below 20 inter-node connections per node, reliable sensor swarm reconstruction is hampered for this algorithm. Therefore, favouring connectivity over identification, is a pivotal finding of our work, as it can and will guide future developments of highly miniaturized sensor nodes.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 665347.

Duisterwinkel, E.H.A., Dubbelman, G., Demi, L., Talnishnikh, E., Wörtche, H.J. and Bergmans, J.W.M. (2018) Robust Reconstruction of Sensor Swarms Floating through Enclosed Environments. Wireless Sensor Network, 10, 1-39. https://doi.org/10.4236/wsn.2018.101001

Algorithm 1. Reconstruction.

Algorithm 2. RANSAC.

Algorithm 3. Guiding RANSAC by sorting the candidate nodes.