^{1}

^{*}

^{1}

In this paper, SLAM systems are introduced using monocular and stereo visual sensors. The SLAM solutions are implemented in both indoor and out door. The SLAM samples have been taken in different modes, such as a straight line that enables us to measure the drift, in addition to the loop sample that is used to test the loop closure and its corresponding trajectory deformation. In order to verify the trajectory scale, a baseline method has been used. In addition, a ground truth has been captured for both indoor and outdoor samples to measure the biases and drifts caused by the SLAM solution. Both monocular and stereo SLAM data have been captured with the same visual sensors which in the stereo situation had a baseline of 20.00 cm. It has been shown that, the stereo SLAM localization results are 75% higher precision than the monocular SLAM solution. In addition, the indoor results of the monocular SLAM are more precise than the outdoor. However, the outdoor results of the stereo SLAM are more precise than the indoor results by 30%, which is a result of the small stereo baseline cameras. In the vertical SLAM localization component, the stereo SLAM generally shows 60% higher precision than the monocular SLAM results.

Simultaneous localization and mapping (SLAM) is the procedure of building the map of the surrounding environment of a vehicle/rover and uses the computed map to determine the vehicle/rover location. In the past decade there was active research to solve the SLAM problem using several methods of computations. The great majority of work has focused on improving computational efficiency while ensuring consistent and accurate estimates for the map and vehicle pose. However, there has also been much research on issues such as nonlinearity, data association, and landmark characterization, all of which are vital in achieving a practical and robust SLAM implementation. SLAM process challenges are centered on methods enabling large-scale implementations in increasingly unstructured environments and especially in the GPS denied environment. Indoor 3D mapping helps to view the three-dimensional objects and spatial structure on the computer efficiently. A precise description about the scene through the sensors is required. In the latest research contribution, SLAM has become more popular, there are three major categories: visual-based, laser-based and depth-visual based. There are the visual-based SLAM solutions such as ORB-SLAM [

Kalman filters have two main parts: prediction and update. In order to solve the nonlinear problem, the Extended Kalman Filter (EKF) was put forward. The EKF-SLAM can only deal with a single mode. It is successful in medium-scale scenes but when it comes to a large map, it becomes computationally intractable. Certainly, there is better method solving the nonlinear problem like Unscented Kalman Filter (UKF). The Kalman filter and its variants can only model Gaussian distributions, so an approach is needed to deal with the arbitrary distributions. However, particle filters can deal with the arbitrary distributions by using multiple samples. This method deems that the more particles fall into a region, the higher the probability of the region is [

In this paper, SLAM system is introduced using monocular, and Stereo visual sensors. SLAM solutions are implemented in both indoor and outdoor using extended Kalman filter. The SLAM samples have been taken in different modes such as a straight line that enable us to measure the drift in addition to the loop sample that is used to test the closure and its corresponding trajectory deformation. In order to verify the trajectory scale, a baseline method has been used. In addition, a ground truth has been captured for all the samples indoor and outdoor to measure the biases and drifts caused by the SLAM solution. Both monocular and stereo SLAM data have been captured with the same visual sensors which in the stereo situation had a baseline of 20.00 cm. It has been shown that, the stereo SLAM localization results are 75% higher precision than the monocular SLAM solution. In addition, the indoor results of the monocular SLAM are more precise than the outdoor. However, the outdoor results of the stereo SLAM are more precise than the indoor results by 30%, which a result of the small stereo baseline cameras. In the vertical SLAM localization component, the stereo SLAM generally shows 60% higher precision than the monocular slam results.

The state-based formulation of the SLAM includes the computation of a joint state made up of a vehicle/rover pose and the locations of captured landmarks. This problem formulation has a unique structure; the process model only affects vehicle/rover pose states while the observation model only makes reference to a single vehicle-landmark pair. A large range of strategies have been developed to take advantage of this special structure in limiting the computational complexity of the SLAM algorithm. There are two categorize of the SLAM techniques that aims for improving the computational efficiency namely; optimal or conservative solutions. The optimal solutions target to reduce required computation and resulting in computations and covariances for the full-form SLAM algorithm. While the conservative algorithms result in estimates that have larger uncertainty or covariance than the optimal result. Usually, conservative algorithms are less accurate but more computational efficiency, therefore, of value in real implementations [

The accurate reconstruction of the captured scene from sets of ordered images has a long history in aerial [

In Monocular SLAM [

Paz et al. [

At the time the vehicle/rover moving the SLAM build a map of the surrounding environment and at the same time use this map to estimate its location with respect the landmarks. In SLAM both the trajectory of the vehicle/rover and the location of all landmarks are estimated without the need for any a priori knowledge of location.

^{th} landmark which true location is presumed time invariant. z i k is an observation measured from the vehicle/rover of the location of the i^{th} landmark at time k. When there are several landmark observations at the same time, the observation will be expressed as z k . The history of vehicle locations can be described as X 0 : k = { x 0 , x 1 , ⋯ , x k } = { X 0 : k − 1 , x k } and the history of control inputs is U 0 : k = { u 1 , u 2 , ⋯ , u k } = { U 0 : k − 1 , u k } . In addition, the landmark location is m = { m 1 , m 2 , ⋯ , m n } and the set of landmark observations can described as Z 0 : k = { z 1 , z 2 , ⋯ , z k } = { Z 0 : k − 1 , z k } [

The Simultaneous Localisation and Mapping (SLAM) problem in probabilistic form demands that the probability distribution be calculated for all times k as P ( x k , m | Z 0 : k , U 0 : k , x 0 ) . This probability distribution defines the joint posterior density of the landmark locations and vehicle/roverstate (at time k) given that the measured observations and control inputs including time k along with the initial state of the vehicle/rover. Basically, a recursive solution to the SLAM problem is required. Beginning with an estimate for the distribution P ( x k − 1 , m | Z 0 : k − 1 , U 0 : k − 1 ) at time k − 1, the joint posterior, following an observation z k , and control u k is calculated. This calculation needs that a state transition model and an observation model are defined to describe the effect of the control input and observation respectively. The observation model defines the probability of taking an observation z k when the vehicle/roverlocation and landmark locations are known. The motion model for the vehicle/rover can be formulated as a probability distribution on state transitions in the form P ( x k | x k − 1 , u k ) . Therefore, the state transition is assumed to be a Markov process in which the next state x k is independent of both the observations and the map, and depends only on the immediately proceeding state x k − 1 and the applied control u k [

The SLAM algorithm is currently implemented in a standard two-step recursive (sequential) prediction (time-update) as shown in Equation (1) and correction (measurement-update) form as shown in Equation (2)

P ( x k , m | Z 0 : k − 1 , U 0 : k , x 0 ) = ∫ P ( x k | x k − 1 , u k ) × P ( x k − 1 , m | Z 0 : k − 1 , U 0 : k − 1 , x 0 ) d x k − 1 (1)

P ( x k , m | Z 0 : k , U 0 : k , x 0 ) = P ( z k | x k , m ) P ( x k , m | Z 0 : k − 1 , U 0 : k , x 0 ) P ( z k | Z 0 : k − 1 , U 0 : k ) ) (2)

Equations (1) and (2) describe a recursive procedure for calculating the joint posterior P ( x k , m | Z 0 : k , U 0 : k , x 0 ) for the vehicle/roverstate x k and map m at a time k depend on all observations Z 0 : k and all control inputs U 0 : k including time k. The recursion is a function of a vehicle/rovermodel P ( x k | x k − 1 , u k ) and an observation model P ( z k | x k , m ) . In addition, the mapping problem could be formulated as calculating the conditional density P ( m | X 0 : k , Z 0 : k , U 0 : k ) . This assumes that the location of the vehicle/rover x k is known at all different times, subject to known of initial location. Then, a map m is constructed by merging observations from different locations. On the other hand, the localisation problem may be defined as calculating the probability distribution P ( x k | Z 0 : k , U 0 : k , m ) . This assumes that the landmark locations are known with certainty and the objective is to calculate an estimate of vehicle/roverlocation relative to these landmarks [

In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. The EKF-SLAM describe the vehicle/rover motion and observation model as shown in Equations (3) and (4) [

P ( x k | x k − 1 , u k ) ⇔ x k = f ( x k − 1 , u k ) + w k (3)

where f models vehicle kinematics and where w k are additive, zero mean uncorrelated Gaussian motion disturbances with covariance Q k .

P ( x k | x k − 1 , u k ) ⇔ z ( k ) = h ( x k , m ) + v k (4)

where h describes the geometry of the observation and where v k are additive, zero mean uncorrelated Gaussian observation errors with covariance R k . With these definitions the standard EKF method can be applied to compute the mean and covariance of the joint posterior distribution P ( x k , m | Z 0 : k , U 0 : k , x 0 ) as shown in Equations (5) and (6) with time-update in Equations (7) and (8) [

[ x ^ k | k m ^ k ] = E [ x k m | Z 0 : k ] (5)

P k | k = [ P x x P x m P x m T P m m ] k | k = E [ ( x k − x ^ k m − m ^ k ) ( x k − x ^ k m − m ^ k ) T | Z 0 : k ] (6)

x ^ k | k − 1 = f ( x ^ k − 1 | k − 1 , u k ) (7)

P x x , k | k − 1 = ∇ f P x x , k − 1 | k − 1 ∇ f T + Q k (8)

where ∇ f is the Jacobina matrix of f evaluated at the estimate x ^ k − 1 | k − 1 . As the landmark are stationary, there will be no need for the time update. In addition, Equations (12) and (13) describe the observation update model [

[ x ^ k | k m ^ k ] = [ x ^ k | k − 1 m ^ k − 1 ] + W k [ z ( k ) − h ( x ^ k | k − 1 , m ^ k − 1 ) ] (9)

P k | k = P k | k − 1 − W k S k W k T (10)

where

S k = ∇ h P k | k − 1 ∇ h T + R k (11)

W k = P k | k − 1 ∇ h T S k − 1 (12)

where ∇ h is the Jacobian of h evaluated at x ^ k | k − 1 and m ^ k − 1 .

The loop-closure, when a vehicle/roverreturns to re-observe landmarks after a large traverse, is especially difficult. The association problem is compounded in environments where landmarks are not simple points and indeed look different from different viewpoints. EKF-SLAM employs linearized models of non-linear motion and observation models and so inherits many cautions. Non-linearity can be a significant problem in EKF-SLAM and leads to expected, and sometimes dramatic, inconsistency in solutions [

All SLAM systems introduced in this paper have been implemented in different environment from indoor to outdoor. Several samples have been taken in different trajectory shape such as a straight line that enable us to measure the drift in addition to the loop sample that we use it to test the closure and its corresponding trajectory deformation. A ground reference has been captured for all the samples to measure the biases and drift caused by the SLAM solution.

The results shown in

The second sample for the indoor monocular SLAM solution was a straight line.

On the other hand, outdoor SLAM solutions have been introduced including monocular, and stereo visual sensors.

indoor environments. However, both indoor and outdoor have the same map density. Similar to the indoor monocular SLAM solution, the outdoor solution suffers from drifts that cause open loop. However, applying the loop closure constraints results in a closed loop with less deformation than the indoor loop. There is less loop deformation, which may be due to the wide field of view with more well distributed captured features.

The outdoor stereo SLAM results are shown in

The stereo outdoor SLAM solution results are shown in

In addition,

The horizontal root mean squares error (RMSE) of the SLAM trajectories for Monocular, and stereo solutions are shown in

Figures 16-21 show the error in the vertical localization SLAM results with the angular rotation around an x axis, perpendicular to the direction of movement, as shown in

The vertical SLAM localization results are shown in

A comparison between RMSE of the horizontal and vertical localization components are shown in

In this paper, SLAM system is introduced using monocular and stereo visual sensors. SLAM solutions are implemented in both indoor and outdoor. The SLAM samples have been taken in different modes such as a straight line that enable us to measure the drift in addition to the loop sample that is used to test the closure and its corresponding trajectory deformation. In order to verify the trajectory scale, a baseline method has been used. In addition, a ground truth has been captured for all the samples indoor and outdoor to measure the biases and drifts caused by the SLAM solution. Both monocular and stereo SLAM data have been captured with the same visual sensors which in the stereo situation had a baseline of 20.00 cm. It has been shown that, the stereo SLAM localization results are 75% higher precision than the monocular SLAM solution. In addition, the indoor results of the monocular SLAM are more precise than the outdoor. However, the outdoor results of the stereo SLAM are more precise than the indoor results by 30%, which is a result of the small stereo baseline cameras. In the vertical SLAM localization component, the stereo SLAM generally shows 60% higher precision than the monocular SLAM results.

The authors declare no conflicts of interest regarding the publication of this paper.

Afifi, A. and Woo, B. (2019) Simultaneous Localization and Mapping Solutions Using Monocular and Stereo Visual Sensors with Baseline Scaling System. Positioning, 10, 51-72. https://doi.org/10.4236/pos.2019.104004