Indoor 3D Reconstruction Using Camera, IMU and Ultrasonic Sensors

The recent advances in sensing and display technologies have been transforming our living environments drastically. In this paper, a new technique is introduced to accurately reconstruct indoor environments in three-dimensions using a mobile platform. The system incorporates 4 ultrasonic sensors scanner system, an HD web camera as well as an inertial measurement unit (IMU). The whole platform is mountable on mobile facilities, such as a wheelchair. The proposed mapping approach took advantage of the precision of the 3D point clouds produced by the ultrasonic sensors system despite their scarcity to help build a more definite 3D scene. Using a robust iterative algorithm, it combined the structure from motion generated 3D point clouds with the ultrasonic sensors and IMU generated 3D point clouds to derive a much more precise point cloud using the depth measurements from the ultrasonic sensors. Because of their ability to recognize features of objects in the targeted scene, the ultrasonic generated point clouds performed feature extraction on the consecutive point cloud to ensure a perfect alignment. The range measured by ultrasonic sensors contributed to the depth correction of the generated 3D images (the 3D scenes). Experiments revealed that the system generated not only dense but precise 3D maps of the environments. The results showed that the designed 3D modeling platform is able to help in assistive living environment for self-navigation, obstacle alert, and other driving assisting tasks.


Introduction
The fusion of multiples ultrasonic sensors and a camera to produce a better 3D How to cite this paper: Mulindwa, D.B. (2020) Indoor 3D Reconstruction Using Camera, IMU and Ultrasonic Sensors. reconstruction of an indoor environment is a challenging task even though it helps to extend the possibilities in robotics. The main idea is to build up a platform combining the advantages of low-cost compensatory sensors towards acceptable indoor environment 3D reconstruction, with a satisfactory quality for mobile platforms (such as a wheelchair) navigation and driving assistance.
This paper implements the data fusion at two levels. Firstly, to fuse the data from the ultrasonic sensor array to compute the distance to any objects (or obstacle) in the scene with accuracy after removing measurement noise and drift, and secondly, to fuse the ultrasonic depth measurements and the 3D point clouds generated from multiple 2D calibrated images from the mounted web camera using existing methods.
There are few methods that depend on a regular camera output to reconstruct 3D scenes [1]. These techniques are known as structure from motion (SfM). Simultaneous localization and mapping (SLAM) is one such application in the domain of incremental structure from motion. This paper is arranged as follows. In Section 2, the related work is discussed to assess the most common techniques for 3D point clouds generation. Section 3 explores the background to this research by providing the motivations for the proposed algorithm while Section 4 explains in detail the different steps taken to reach the results by exposing the proposed methodology. This is followed by the experimental results in Section 5 as well as the conclusions in Section 6.

Related Work
Several authors have attempted to reconstruct the 3D indoor environments using non-invasive sensors such as camera, lidar… Structure-from-motion (SFM) is a well-known approach to recover the 3D shape of an object in the presence of numerous sequences [2]. Tomasi and Kanade [3] introduced a factorization method to recuperate the contour of rigid objects using an orthographic camera.
With these sensor fusion applications, a mandatory step for extrinsic calibration is the relative transformation between the involved sensors. Qilong Zhang and Robert Pless [4] explain both the experimental and theoretical results for the extrinsic calibration of a 2D laser range finder and a camera.
The Iterative Closest Point (ICP) algorithm normally uses the features of the equivalent points in the registration process to locate the closest neighbor matching points [5]. As an algorithm for point clouds registration, the Iterative Closest Point needs some little improvement since it presents numerous limitations such as the inability to define acceptable correspondence between two point clouds especially for sparse point clouds [6]. Accordingly, Gressin et al. [7] define some low-level features to tackle the correspondence by adopting a hybrid approach that combines surface reconstruction and feature extraction for 3D point clouds registration.

Proposed Methodology
This article is aimed at using a modified but robust iterative closest point algorithm that combines the structure from motion generated 3D point clouds with the ultrasonic sensors and IMU generated 3D point clouds to derive a much more precise point clouds using the depth from the ultrasonic sensors.

Reliable Distance Measurements
The detection of the 3D reflection point implies two tasks such as the range measurements from the facing obstacle as well as the wheelchair displacements detection in the indoor environment. As the wheelchairs carrying the system (ultrasonic sensors grid, STM32F407VG, MPU9250, camera…) is moving along a wall, an accurate 3D point cloud is expected to reflect this reality.

1) Distance Estimation Procedure
One ultrasonic sensor only provides a single measurement output for each transmitted pulse [8]. Hence, if facing multiple obstacles in the detection range, an ultrasonic sensor detects one object only, unlike other range measurement sensors such as the lidar. In reality, unexpected data jumps in the ultrasonic sensor measurements cause an unreliability concern [9].
To alleviate the vulnerabilities of relying on single ultrasonic measurements, the research used an array of multiple ultrasonic sensors (Algorithm 1).

Algorithm 1. Ultrasonic sensors array distance estimation algorithm.
Distance measurement algorithm of the four ultrasonic sensors array % Data availability in the communication channel If data available continue; end; If not, wait for more data. % Data validity is expressed through the condition that the reading should be between 10 cm and 1000 cm If data < 10 || data > 1000 then data valid, continue; end; % Addition of channel selected as they are changing since the system is using 4 channels (4 ultrasonic sensors) as organized by a multiplexer counter (channel_ID) = counter (channel_ID) + 1; % Local reflection point position where only the Z component (depth) is read while X and Y (platform displacements) are set to zeros here x = zeros(size(d)); y = zeros(size(d)); z = (0.5*d-p./(2*d)); where d = ultrasonic sensor distance p = coordinates position of the ultrasonic sensor in the array with respect to the active ultrasonic sensor 2) 3D Point Cloud Generation Procedure

1) Data Collection and Processing
The design of a Simulink diagram ( Figure 3) was a critical task in order to manage both the connections and the data communication in the system. For UART communication between the proposed system and the computer, a Simulink diagram had to be designed in order to dictate the format, the length and the nature of the data passing through the system.

2) Camera Calibration
The detected reflecting points are projected to the global coordinates system to have a cumulative point cloud for the environment. The locally reflected point will then be the mean value of all these Z values found according to their specific ultrasonic sensor on the mobile platform. In order to convert local to global coordinates, the displacement of the mobile platform has to be acquired accurately in all three dimensions namely the X, Y and Z planes. In order to gather the displacements, a double integration of the processed acceleration values from the Sparkfun MPU9250 is performed before the inherent drift is removed.   The camera is first calibrated using a checkerboard pattern (Figure 6). At that moment, the position of the plane in the camera coordinates system is computed. The ultrasonic sensors system also scans the same plane. As the camera catches the plane images, the presence of some of the ultrasonic sensors scanned points on the pattern plane allows estimating the pose between the sensors.

3) Position Estimation
Since X and Y constitute the first two components of the 3D point clouds (X, Y, Z), it is very important to keep track of the wheelchair movements at all times.
These displacements are acquired by integrating the acceleration data twice.   In reality, as the experiment begins, all three parameters (namely the acceleration, the velocity and the position) are zero (null). As the wheelchair starts to move, the acceleration is expected to change quickly as much as the velocity while the position also shifts numerically. The problem is that a drift appears and affects the accuracy of the position.
In order to tackle the drift problem, an algorithm computes the drift rate which will derive the range of the drift in the velocity data. At the end of the day, this velocity drift will be subtracted from the velocity.
The value of the rotating angle results from the integration of the angular velocity obtained from the gyroscope readings over time to obtain the Euler angles as shown in Figure 7.
Experiments have shown that the gyroscope data also have a drift that is constantly accumulating errors resulting in very erroneous data. The solution could be fusion with the accelerometer to correct the drift through a complimentary filter.

Modified Iterative Closest Point Algorithm
The process of enhancing the quality of the resulting point cloud is depicted in Figure 8. First, both the generated point clouds (structure from motion and ultrasonic sensors) are projected on a two-dimensional grid to determine neighborhood between points in 2D. This process is followed by a surface analysis that performs a feature extraction of the scene targeted and an alignment of the twopoint clouds. Finally, the point clouds are registered and fused in order to improve the precision of the 3D model.

2D Projection
The two consecutive 3D points (from Sfm and from ultrasonic sensors array) are projected onto a two-dimensional grid at the same time. The idea is to be able to  hood. This method uses the closeness of two pixels in the 2D projection to conclude that two points are neighbors in 3D.

Feature Extraction
Targeted scene 3D reconstruction depends largely on the geometrical features as a result of the choice of the neighborhood selected. Because of the accuracy of the ultrasonic sensor, the points selected in the same neighborhood will define the geometrical features of the scene. These features are the result of the depth measurements gathered by reading the distances from the system.

Local Registration
Most solutions to detect the alignment of two 3D point clouds lie on the iterative closest point algorithm. A difficulty lies in the fact that the ICP approach presents stern limitations. For instance, it is difficult to detect acceptable equivalence between two consecutive point clouds, as often, the corresponding pairs found do not illustrate the same point in reality leading to inaccurately aligned 3D point clouds and to wrongly projected sensor poses.
The advantage of using the 3D point clouds produced by the ultrasonic sensors is that they are accurate even though sparse. This accuracy in distance measurement will be used to improve the quality of the resulting 3Dpoint cloud. The improved ICP algorithm steps are explained in Algorithm 2 below.

IMU and Ultrasonic Sensors 3D Point Clouds Generation
To achieve accurate 3D scans of the apparent scenes, this paper used a platform mounted with four ultrasonic sensors, camera, STM32F407VG, motion sensor MPU9250 (IMU). The system was mounted over a mobile platform such as a wheelchair in order to assist the living conditions for the handicapped or impaired vision individuals.

System Practical Setup
The practical setup as illustrated with the wiring diagram ( Figure 9) is as follow: The main role of the setup was twofold: • to determine the 3D coordinates of the reflected points as the platform is moving around the room.
• to allow the 2D camera to simultaneously take pictures of the viewed scenes.

3D Reflection Points
In this research, the process to generate 3D point clouds involves After the camera calibration, a sequence of 2D images was input before being converted to grayscale. While undistorted images are acquired, the corner features are extracted where matching points are identified. Just as the extraction of the feature points and the identification of the match points, the object location is acquired using both the scene's matching points and the depth data analysis from the ultrasonic sensors and inertial navigation system.

Extrinsic Calibration
The MATLAB camera calibration application is used to calibrate this webcam.
For this session, the calibration target was a checkerboard pattern with a checkerboard square of 25 mm. First, the application aimed to detect the checkerboard pattern in each of the subsequent added pictures, and secondly to display  Upon completing this session, the projection matrix of the Logitech webcam C920 Pro specific to this study was computed.

3D Reconstruction
The dense reconstruction was two folds since first the Sfm reconstruction first took place followed by the correction from the ultrasonic and IMU generated point cloud using its Z component. In the ICP algorithm, a step did implement the replacement of the Z component.
For the Sfm, the image pictures displayed below in Figure 14, which has 2 views, shows a view of an indoor environment. These two adjacent pictures were turned into 3D views following the methodology presented in this work.   The features of two images were detected and associated, as shown in Figure   14, Figure 15, and Figure 16.
The corner points in the scene were considered as the features (Figure 15 and Figure 16) for further processing.
After that, the images with features extracted were associated by matching the feature points as shown in Figure 17.
Just as the extraction of the feature points and the identification of the match points, the object location was acquired using both the scene's matching points and the depth data analysis from the ultrasonic sensors and inertial navigation system.
After using the Z co-ordinate from the ultrasonic and IMU generated 3D point cloud (Figure 18) to correct for depth estimation, the 3D model was reconstructed in the form of dense reconstruction (Figure 19).

Conclusions
This article proposed a method for the 3D reconstruction of indoor environments using sensor fusion. This technique helps in assistive living environment Journal of Sensor Technology     The existing algorithms generate the 3D images (point clouds) from 2D images, which offered high spatial resolution. Then the ultrasonic rangefinders were utilized to capture the spatial measurement, which was then used to correct the 3D images to obtain a higher spatial accuracy. Although not much novel research done on this point, inertial navigation system constitutes the basis of this research since without accurate positioning of the mobile platform (wheelchair), it is almost impossible to determine the various obstacles distance as well as the distance traveled. The experiment results show the platform achieved satisfactory performance for the requirements of mobile facility navigation and driving assistance.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.