A Velocity-Based Rao-Blackwellized Particle Filter Approach to Monocular vSLAM

This paper presents a modified Rao-Blackwellized Particle Filter (RBPF) approach for the bearing-only monocular SLAM problem. While FastSLAM 2.0 is known to be one of the most computationally efficient SLAM approaches; it is not applicable to certain formulations of the SLAM problem in which some of the states are not explicitly expressed in the measurement equation. This constraint impacts the versatility of the FastSLAM 2.0 in dealing with partially ob-servable systems, especially in dynamic environments where inclusion of higher order but unobservable states such as velocity and acceleration in the filtering process is highly desirable. In this paper, the formulation of an enhanced RBPF-based SLAM with proper sampling and importance weights calculation for resampling distributions is presented. As an example, the new formulation uses the higher order states of the pose of a monocular camera to carry out SLAM for a mobile robot. The results of the experiments on the robot verify the improved performance of the higher order RBPF under low parallax angles conditions.


Introduction
The basic idea of simultaneous localization and mapping (SLAM) was originally discussed for autonomous robots mainly because of the need to locate the robot in real time in a map which is incrementally built using the information obtained from the robot control and measurement systems.Smith et al. [1] established a probabilistic approach to model the uncertainty in the control and measurement signals.In essence, the SLAM provides a suitable framework for describing robot motion and landmark locations together with the underlying uncertainties associated with both of them.In this approach, motion model and observation model are described as the first-order and zero-order Markov processes, respectively.In the probabilistic approach, the SLAM problem is considered as a filtering problem in which the control input and robot observations are known and the robot pose and landmark locations are the desired unknown states.Although other approaches have been used to solve the SLAM problem, probabilistic approach has been remained as the mainstream of the SLAM literature [2].
In order to solve the SLAM problem, Csorba and Durrant-Whyte [3] proposed the use of the Extended Kalman Filter (EKF) which is capable of solving non-linear filtering problems.Their work was further extended in [4,5] where convergence and performance of the EKF solution were discussed, and successful experimental results were reported.
A major limitation of the EKF solution is that the covariance matrix must maintain the information of all landmarks of the map, and hence the state vector is at least an space, where denotes the number of landmarks.Clearly, the covariance matrix has to be updated in each iteration at a cost proportional to .This computational cost limits the real-time implementation of the EKF SLAM to simple environments and sparse mapping.As a result, reducing the computational load of the EKF-based SLAM solutions has been a major focus for the researchers working on the SLAM problem [6][7][8].Liu and Thrun proposed a new solution for the SLAM problem using the Extended Information Filter (EIF) [9].In their solution, the information matrix is sparsified and therefore, memory and computation load have been reduced significantly.Another solution for the SLAM problem, FastSLAM, based on the Rao-Blackwellized particle filtering (RBPF) was presented by [10].Fast-SLAM is based on factorization of the robot path from landmark locations in which observation of each land-mark is independent of any other landmark position [11,12].The use of well-tuned particle filters instead of Kalman-type filters can result in less computational complexity and more robustness in data association [13].Also, unlike the EKF, there is no need for the linearization of the motion model.The main shortcoming of FastSLAM was the large number of samples that are needed to avoid the degeneracy of the filter.This problem has been addressed in FastSLAM 2.0 in which both control inputs and observations are used to obtain the sampling distribution.The use of the observations in obtaining the sampling distribution not only reduces the number of samples required for the filtering process but also increases the overall efficiency of the filter [14].However, it should be noted that in the original form of FastSLAM 2.0, the measurements are only useful for the sampling distribution of the states which are explicitly used in the measurement model.A goal of the algorithm proposed in this work is to eliminate this limitation.
Recent research on the SLAM problem focuses on finding more computationally efficient solutions using vision-based approaches, referred to as vSLAM.Typically, it is desirable to adopt a monocular vision system i.e., a single camera, for generating the map [15][16][17][18].The use of a single camera, which is an example of a bearing only device when used as the only source of information, causes the problem of partial observability [17,19] that is attributed to the lack information about the depth of the landmarks.However, the standard SLAM solutions are based on a fully observable system, and hence the main motivation of this work is to introduce an appropriate method for monocular vSLAM.
It is well-known that similar to any other filtering and estimation problem, the performance of the bearing-only SLAM systems is strongly influenced by the system order [20].For example, higher order systems may be able to track more complicated maneuvers and abrupt changes of robot trajectories but they can also increase the estimation error unnecessarily for simple motion patterns.In the EKF-based SLAM framework, the motion model order can be modified easily and additional states can be added to the state vector and covariance matrix, but such modification for Rao-Blackwellized SLAM framework is not trivial since higher order states such as velocity and acceleration have no effect in the observation in the original formulation.Although FastSLAM 2.0 is superior to FastSLAM in terms of performance and convergence, it is not applicable to the bearing-only systems with higher order states.
The idea of the use of higher order state such as velocity in an RBPF-based SLAM was introduced and briefly discussed for bearing-only systems in [21].Here, detailed formulation as well as a thorough analysis of the estimation performance using experimental results is presented for the proposed approach.In this approach, first the robot pose is updated by sampling velocity from the proposal distribution.Similar to FastSLAM 2.0, the proposed approach incorporates the information obtained from both the control input and observations to obtain the proposal distribution.However, the can also adopt the information obtained the observations for the proposal distribution of a state which is of higher order than the observed state (i.e.velocity where the displacement is measured).In essence, this improves the versatility of the proposed approach with respect to the original formulation of FastSLAM 2.0.Considering the updated robot pose and last observation, the landmark locations are updated using the EKF.In the next stage, the importance weight of each particle is calculated.Finally, a set of particles is resampled with respect to the weight of the particles calculated in the previous step.In order to solve the problem of partial observability, each feature is initialized using a modified scheme in which the landmarks are defined using their inverse depth to decrease the nonlinearity of the measurement equation in low parallax angles [22].
Another implementation of the higher order Rao-Blackwellized SLAM has been introduced by Törnqvist et al. [23].However, their approach is based on Fast-SLAM in which sampling distribution is independent of observations whereas the approach proposed in this paper proposes an extension of FastSLAM 2.0 in which sampling distribution relies on the control input as well as the observations.More precisely, they have focused on reducing the computational load of FastSLAM by introducing a new factorization technique, whereas in this paper the main concern is to extend FastSLAM 2.0 so that it incorporates observations in obtaining the sampling distribution of the states even if they are not explicitly used in the measurement model.Another difference between this work and the one presented in [23] is that their approach has been implemented into an unmanned aerial vehicle (AUV) equipped with multiple sensors including inertial navigation system (INS) and cameras whereas the main focus of this paper is on bearing-only systems and tackling the partial observability problem.
The detailed formulation of the proposed estimation approach including sampling and importance weight distributions are presented in Section 2. In Section 3, the motion and observation models of a bearing-only system are discussed.Section 4 presents another important feature of the proposed SLAM method that is the initialization scheme required to tackle the partial observability problem.Section 5 presents the results of a SLAM experiment carried out using a digital camera in a laboratory environment.

Velocity Based RBPF Approach
In the SLAM problem, the only available information for the observer is the measurements, 1:t and the control input 1:t , e.g.acceleration, as they become available.The index 1:t refers to the time period between the first iteration and the iteration at time In the vision-based SLAM, it can be assumed that the data association problem [24] is solved using image processing methods such as the SIFT algorithm [25][26][27].The bearing-only system is modeled using a constant velocity model which means that both position and velocity are included in the state vector.In this way, the accelerations are characterized using a Gaussian distribution.The approximated acceleration is assumed as the control input.The velocity dynamics can be presented as the first order Markov chain: where t x is the velocity in time t.The symbol N denotes the normal distribution with the mean of 1 ( , ) and covariance of t .The relationship between the velocity and pose is a deterministic function presented as: which means that the current pose, is determined based on the previous pose, and the current velocity, tively.In the full SLAM problem formulation, the final objective is to find the posterior of the entire path and the map [13]: where s is the robot pose state vector, x y  m , stands for known data association and is the map which is the set of landmarks: It is shown that this problem can be written in the factored form [28]: This factored form can be solved using one of the RBPF based approaches discussed in [10,14].The localization problem is solved using the particle filter.Each particle, t , in this approach consists of a robot trajectory estimate and a set of estimators of the individual landmark locations.In order to use the Rao-Blackwellized Particle Filter (RBPF) approach, proper proposal and importance weight distributions have to be determined y

Proposal Distribution
Based on the idea discussed in [14], the proposal distribution is chosen as: This distribution includes all information related to the path, [ ] 1: , this distribution can be expanded as: Using Equation ( 2) and the Markov rule: where  is Kronecker delta and d t y is the deterministic result of the Equation (3) when t x and [ ] 1: 1 k t y  are given.Therefore, the proposal distribution of Equation ( 6) is simplified as: Thus, sampling must occur in two separate stages.The velocity states, t x has to be drawn from the following probability distribution: whereas the states related to the robot pose, x have to be calculated using Equation (2).Using the Bayesian formulation, the probability distribution of Equation ( 11) can be presented as: and using Markov chain: Considering the observation model, the posterior in Equation ( 12) can be presented as: A general closed form solution of Equation ( 13) is not available unless the linearizability assumption [14] is imposed on the system.Using these assumptions the closed form of the Equation ( 13) can be obtained as the: where where , ˆk t y and [ ] ˆk t x are prediction of the observation, velocity states and pose states, respectively.Also y H is the Jacobian of the measurement model with respect to the pose states.In addition, is defined as: where Considering the fact that the covariance of the Gaussian function is the inverse of the second derivative, and also invoking Equation ( 16), the sampling distribution covariance is obtained from: where x Y is the Jacobian of the Equation ( 2) with respect to the robot velocity states.Also, the mean value which is the root of the first order derivative equals to: In summary, to sample from the trajectory, the first velocity states have to be drawn from the Gaussian distribution with the covariance and mean indicated in Equations ( 17) and ( 18), respectively.Then, the pose states have to be calculated using Equation (2).

Importance Weight
As it is discussed in [14] in the particle filtering and also RBPF, resampling is the crucial stage of the filtering process.In order to resample the particle, a proper importance weight has to be assigned to each particle.As the other particle filter approach, the importance weight Using Markov and a similar procedure to Equations ( 9[ ] [ ] , , , )-( 13): Using the system and measurement model, Equations ( 1)-( 3), and linearizing Equations ( 2)-( 3), the closed form of the Equation (24), can be obtained as the Gaussian distribution: where The higher order RBPF for M particles can be summariz ents Model g the constant veloc-ed as it is shown in Table 1.

Camera Motion Model
The camera motion is modeled usin ity input assumption in which the velocity update equaon can be expressed as [29]: (27) where and are the velocity velocity of the camera in the world coordinates, respeclso where ti  and are accelerati me respectively.In the bearwhere is the quaternion defined dopted from [30].every point i from a ca t  interval, on, angular acceleration and ti ing-only situation acceleration and angular acceleration can be estimated by a Gaussian white noise.The camera pose can be defined by camera optical center, W r , and the quaternion, W q , defining the position and orientation of the camera in the global coordinates, respectively: by the rotation vec t .

Measurement Model
The measurement model has been a a 3D scene, the observation of In mera is defined by a vector consisting of six states:

 
, , , , , This vector determines the point location as: where is the location of the optical center of the camera at the first observation, i  g ray, is the inverse of the point depth along the carryin ( , , )  and i  tional vector are azimuth and elevation angles of ray at the global coordinate frame.Th direc dire e ray ctional vector, ( ,   is defined as: Each observation can be interpreted as a constraint between the camera pose and the corresponding landing the pin-hole mark.Us landm the projectio model ark in the camera coordinate is gi [31], the location of the note ven by: e cam de where CW R is the rotation matrix from the global coordinate frame to the camera coordinate frame.The last equation can be expanded as: According to the pinhole model, t n of into the im era observes d by: Also,  and  a d as: re define process is carried out using an tialization approach is similar to the i proach discussed by [30,32] but modified according to the RBPF approach used here.The initial state values for the landmark, ˆi m can be expressed as:

 
ˆˆ, , , , , These values have to be determined using the location information, observation of new landmarks and the initial estimation h inverse, of the dept 0  .The optic center of the camera is directly taken from the tion of the camera location.The azimuth an by [25,33].current estimaand elevation A digital camera is installed on an Amigobot robot designed by Mobile Robotic Inc.The robot is programmed to move on a straight line with constant velocity of 0.6 m/sec while capturing a video from the laboratory environment.The robot navigation system equipped with wheel encoders indicates that the robot has been moved 2198 mm in x-direction and -121 mm in y-direction.The lization scheme of the zation discussed in [32] lies in itial covariance matrix used.In the higher order R covarian tialization is: ovariance of camera, and the Jaco is giv difference between initia n he noise c : PF the i BPF, t the en by ce matrix after this ini- where t Q is bian , , , where In this experiment, the RBPF consisting of 40 particles initializes and filters the features extracted using the SIFT algorithm from the video frames.The calibration parameters of the camera play a crucial role in initialization and filtering processes [29].The camera has been calibrated based on the open access Matlab toolbox provided in [34].Sample frame of the video and detected SIFT features are shown in Figure 1 In addition to the core RBPF algorithm used in the simulation program, additional subroutines have been added to the filtering algorithm to make it more robust in real world implementations.First, a probabilistic scheme has been added to the filter to evaluate the probability of the SIFT matching results and remove possible outliers.Specifically, RBPF will ignore the observation if the matching probability is smaller than a specific threshold value.The matching probability is computed based on the corre- spondence likelihood discussed in [13].Also, the negative inverse depth which is an important issue in the inverse depth bearing-only SLAM [35] has been avoided using the ability of the particle filter to fuse the hierarchical knowledge with the positive information processed by the probabilistic approach [13]

Experimental Results an Discussion
An offline experiment using a video clip has bee The RBPF estimated path which has been shown in Figure 2 follows the true robot path with slight deviation in the y-direction.This deviation can be explained by basic experimental errors, such as the robot control sys-the m pro on featu approach in combination methods discussed   tem error, the camera-robot alignment and also oversimplification of the measurement model e.g., the radial distortion of the camera lens has been totally ignored in this research.Robot motion estimation in the x and y directions are shown in Figures 3 and 4. The estimation of the heading angle cosine is shown in Figure 5.
Although the major objective in higher order monocular vSLAM is to solve the localization problem and produced sparse cannot be used directly, mapping results is able to provide better insight to the proposed approach.The Monte Carlo simulations of the spatial distributions of a sample landmark, after the first, fourth and seventh frame are shown in Figure 6.As it is shown in Figure 6 sed.Also, the Euclidian norm of the landmark , the spatial volume which represents the uncertainty in the location of landmark, reduces as more frames are proces  location covariance matrix has been calculated for different frames.The evolution of this norm versus number of frames in which the landmark has been observed is shown in Figure 7.As it is shown in Figure 7, the covariance norm decreases drastically after a few frames and then converges to a final value.The trend of the covariance norm is compatible with the Monte Carlo simulation results.

Conclusions
In this paper a new Rao-Blackwellized particle filte formulation where the velocity is considered a constant, filtering is performed under the constant acceleration assumption meaning that the velocity is a state r suitable for monocular vision (i.e., bearing-only) systems is proposed.In the proposed approach, unlike the traditional variable estimated by the filter.Although the inclusion of higher order terms in a parameterized filter such as an EKF may be considered straightforward, the entire formulation must be reworked for such modification of RPBF.New sampling and importance weight distribution based on the constant acceleration assumption have been derived.Also to solve the bearing only problem, a modified initialization scheme has been proposed.Numerical simulations have been carried out to examine the performance of the proposed system.The higher order RBPF approach has been compared to a lower order RBPF approach in similar conditions.Simulation results verify not only the feasibility of the proposed higher order RBPF approach but also its superiority to the lower orde RBPF approach estimates the path successfully.It is also MITACS Inc., Canada.

2 N N 2 N
in time t.The symbol N stands for the normal distribution with the mean of and covariance of .The data association decision and landmark location are shown by and respec- m H is the Jacobian of the measurement model with respect to the landmark location and [ ] observed landmark location estimation.

0 
are the camera center in pixels, f is the focal length, and x d and y d are the pixel sizes.
gles of the projection ray are obtained from inverse of the camera pin-hole model.In the range

2 .
. The robot path and estimated localization results are shown in the Fig- ure The time evolutions of the robot pose states are shown in Figures 3 and 4. Also estimation of cosine of the robot heading angle has been shown in the Figure 5.

Figure 3 .
Figure 3. Estimation of the robot in the x-direction versus time.

Figure 4 .
Figure 4. Estimation of the robot in the x-direction versus time.

Figure 6 .Figure 7 .
Figure 6.The Monte Carlo simulations of the spatial distribution seventh frame.s of a landmark; (from left to right), the first, fourth and observations, 1:t , and data associations, 1:t .Considering the fact that