^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

In order to improve the user’s satisfaction with the augmented reality (AR) technology and the accuracy of the service, it is important to obtain the exact position of the user. Frequently used techniques for finding outdoors locations is the global positioning system (GPS), which is less accurate indoors. Therefore, an indoor position is measured by comparing the reception level about access point (AP) signal of wireless fidelity (Wi-Fi) or using bluetooth low energy (BLE) tags. However, Wi-Fi and Bluetooth require additional hardware installation. In this paper, the proposed method of estimating the user’s position uses an indoor image and indoor coordinate map without additional hardware installation. The indoor image has several feature points extracted from fixed objects. By matching the feature points with the feature points of the user image, we can obtain the position of the user on the Indoor map by obtaining six or more pixel coordinates from the user image and solving the solution using the perspective projection formula. The experimental results show that the user position can be obtained more accurately in the indoor environment by using only the software without additional hardware installation.

Recently AR has made many technological advances, and the range that users can enjoy on smartphones and tablets has increased. AR is a technology that superimposes a virtual object in the real world seen by the user, and is used for many purposes such as automobile navigation, education, tourism, advertisement, shopping, and games field [

The current research trends of algorithms for obtaining the indoor position using a smartphone are as follows. Kalbandhe [

In calculating the position of the user in the room by the BLE tags method, the accuracy was calculated by dividing into three based on the BLE distance. Immediate at 0 - 0.5 m, Near at 0.5 - 3 m, Far at 3 m or more. The error of estimation of the indoor position is within 4 m maximum and the range of meter is different in terms of Immediate, Near, and Far. In addition, the error of the indoor location estimation of the method of obtaining the user position by the four pieces of AP information is 0.6 m on the average.

In this paper, we propose a method to extract indoor images of a fixed image of an indoor map, an indoor map, and a pixel image of a vertex of the object. We propose an algorithm to find the user’s position. Chapter 2 explains perspective projection and homography for easy understanding of the concepts and equations used in Chapter 3. Chapter 3 describes how to obtain the pixel coordinates corresponding to the fixed object vertices in the user image and explains the formula for obtaining the user’s position using the perspective projection algorithm. Chapter 4 verifies the proposed method through experiments and concludes in Chapter 5.

This chapter will explain the perspective projection theory and homography, which are the main mathematical principle of the system that will be proposed in the next Chapter 3.

Projection is a flat representation of a three-dimensional (3D) object on one plane called the view plane. Let l be a line in the plane, and let V be a point not on the line. The perspective projection from V onto l is the transformation which maps any point , distinct from V, onto the point P' which is the intersection of the lines V P ¯ and l, as illustrated in

of the object varies depending on where you look at the object, it is possible to obtain the position of the user by using the perspective projection principle and formula.

When a plane is projected on another plane, a certain transformation relation is established between projected corresponding points, and this transformation relation is called homography. Homography is represented by a 3 × 3 matrix and is a transformation relation that establishes correspondence points on a homogeneous coordinate representation. That is, if points ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x i , y i ) on one plane are projected onto points ( x ′ 1 , y ′ 1 ) , ( x ′ 2 , y ′ 2 ) , ⋯ , ( x ′ i , y ′ i ) on different planes, there is always a 3 × 3 homography matrix H that satisfies Equation (1) between these corresponding points. In Equation (1), “s” is a scale factor [

s [ x ′ i y ′ i 1 ] = [ h 1 h 2 h 3 h 4 h 5 h 6 h 7 h 8 h 9 ] [ x i y i 1 ] , i = 1 , 2 , 3 , … (1)

The database of the proposed system has an indoor map where the user wants to know the location. Also, among the fixed objects in the indoor map, the reference image taken at a position where the object is not distorted and the entire object can be contained in the image, for objects whose edges are sharp or contrasted with surrounding objects. The pixel coordinates information of the vertex is stored in the database. The coordinates of the indoor map are the 3D coordinates of the world coordinate of the object (frame, desk, door, window, computer, etc.) located in the indoor place. When a user takes an image, the reference image in the database and the image taken by the user are extracted, and then the 2D pixel coordinate of the feature point and the 3D coordinate of the world coordinate of the feature point are used as input data. Input data is plugged in the perspective projection equation q = M p and the equation is rearranged to obtain the M vector by the least square method. Then, using the idea that the perspective projection is not the only user’s position, the vector V is obtained to obtain the position of the user. The overall contents are shown in

In this section, “obtain pixel coordinate at a user image using homography” in

In the user image, the steps of obtaining the pixel coordinates of the object vertices are performed separately for each object. In the first step, the SURF algorithm is used to check whether an object composed of feature points existing on one plane in the reference image has the same object and corresponding feature points in the user image. At this time, candidates of corresponding minutiae points are varied, and only four points are extracted through the random sample consensus (RANSAC) algorithm [

by using the feature points used in the reference image and the four feature points finally selected in the user image. Finally, we obtain the vertex pixel coordinates of each object in the user image by matching the vertices of the objects in the reference image to the user image using the previously obtained homography matrix.

Let p = ( X k , Y k , Z k , 1 ) T be the homogeneous coordinates of a 3D point in the indoor map with 1 ≤ k ≤ n , where n is the number of points. Let q = ( u k , v k , 1 ) T be the homogeneous coordinates of the point p projected on the image. According to the theory of perspective projection [

p = [ X k Y k Z k 1 ] , q = [ u k v k 1 ] . (2)

The equality should be understood in the sense of homogeneous coordinates. They are equal if one is a constant multiple of the other. Therefore, we have

u k = m 11 X k + m 12 Y k + m 13 Z k + m 14 m 31 X k + m 32 Y k + m 33 Z k + m 34

and

v k = m 21 X k + m 22 Y k + m 23 Z k + m 24 m 31 X k + m 32 Y k + m 33 Z k + m 34 .

It follows that

u k m 31 X k + u k m 32 Y k + u k m 33 Z k + u k m 34 = m 11 X k + m 12 Y k + m 13 Z k + m 14 (3)

and

v k m 31 X k + v k m 32 Y k + v k m 33 Z k + v k m 34 = m 21 X k + m 22 Y k + m 23 Z k + m 24 (4)

These may be written as a single matrix equation of the form

[ X k Y k Z k 1 0 0 0 0 − u k X k − u k Y k − u k Z k − u k 0 0 0 0 X k Y k Z k 1 − v k X k − v k Y k − v k Z k − v k ] [ m 11 m 12 m 13 m 14 m 21 m 22 m 23 m 24 m 31 m 32 m 33 m 34 ] = [ 0 0 ] (5)

Equation (5) can be repeated for n point and we obtain

[ X 1 Y 1 Z 1 1 0 0 0 0 − u 1 X 1 − u 1 Y 1 − u 1 Z 1 − u 1 0 0 0 0 X 1 Y 1 Z 1 1 − v 1 X 1 − v 1 Y 1 − v 1 Z 1 − v 1 ⋮ ⋮ ⋮ X n Y n Z n 1 0 0 0 0 − u n X 1 − u n X 1 − u n X 1 − u n 0 0 0 0 X n Y n Z n 1 − v n X 1 − v n X 1 − v n X 1 − v n ] [ m 11 m 12 m 13 m 14 m 21 m 22 m 23 m 24 m 31 m 32 m 33 m 34 ] = [ 0 0 ] (6)

Let

R = [ X 1 Y 1 Z 1 1 0 0 0 0 − u 1 X 1 − u 1 Y 1 − u 1 Z 1 − u 1 0 0 0 0 X 1 Y 1 Z 1 1 − v 1 X 1 − v 1 Y 1 − v 1 Z 1 − v 1 ⋮ ⋮ ⋮ X n Y n Z n 1 0 0 0 0 − u n X 1 − u n X 1 − u n X 1 − u n 0 0 0 0 X n Y n Z n 1 − v n X 1 − v n X 1 − v n X 1 − v n ]

and

m = [ m 11 m 12 m 13 m 14 m 21 m 22 m 23 m 24 m 31 m 32 m 33 m 34 ] .

If the measurement is all exact, Equation (6) has a nonzero solution for any number of data points, but this is not be the case in the real-world environment where the measurement of image and world coordinates is inexact (generally termed noise)―there will not be an exact solution to the overdetermined system R m = 0 apart from the zero solution. Instead of demanding an exact solution, we attempts to find an approximate solution, namely a vector m that minimizes a suitable cost function. To avoid the solution m = 0 , we require an additional constraint. Generally, a condition on the norm is used, such as ‖ m → ‖ = 1 . Given that there is no exact solution to R m = 0 , it seems natural to attempt to minimize the norm ‖ R m → ‖ instead, subject to the usual constraint, ‖ m → ‖ = 1 . The solution is the (unit) eigenvector of R T R with minimum eigenvalue. To obtain the minimum value of m T A m where A = R T R , the minimum eigenvalue among the eigenvalues of A must be obtained and the eigenvector corresponding to the minimum eigenvalue must be found.

Assume that n is the number of data used in matrix R where n ∈ N and n ≥ 6 . Then matrix A is a 12 × 12 matrix. We may solve the 12th-order real-valued coefficient equation det ( A − λ I ) = 0 . The matrix A is a symmetric matrix with m T A m ≥ 0 and λ ≥ 0 . Then we can obtain the eigenvalues of A which are λ 1 , λ 2 , ⋯ , λ 12 such that λ 1 > λ 2 > ⋯ > λ 12 .

The m value obtained in the above procedure is rearranged in the form of

M = [ m 11 m 12 m 13 m 14 m 21 m 22 m 23 m 24 m 31 m 32 m 33 m 34 ] .

Let V = [ x y z w ] T be a homogeneous coordinate of the position of the camera and it should satisfies M V = 0 since it is the only point that does not correspond to the plane of the image by the perspective projection transform M . Therefore, V is a nonzero vector in the null space of M . The matrix V divided into w , we finally obtain the user’s position in 3D world coordinate as V = ( x / w , y / w , z / w ) .

In this chapter, the following method was used to verify the proposed method at Inno Gallery, Kyunghee University. When the user takes an indoor image, obtain the pixel coordinate in the user image using the method described in section 3.2, and assign the 6 pairs of world coordinates corresponding to the vertices of the object captured as the feature points in the user image to the Equation (6) of chapter 3 to obtain the user’s position. In order to analyze the error due to the increase in the number of data, the experiment is repeated in the same order as in the case of eight pairs of points.

In order to compare the performance of the proposed method, we will compare it with the result of user location using homography. The method of using homography should know the camera position of the reference image in advance. Using the SURF algorithm, we find feature points that are invariant to environmental changes such as scale, illumination, and viewpoint from two images from the reference and user images, and match reference and user images and obtain homography between the two images. If you de-compose the homography, you can see how much the user’s image has been rotated and translated from the reference image. The results can be used to determine where the user took the image.

Since the perspective projection was designed for the pinhole camera, the camera parameters were obtained with the image resolution of 3264 × 1836 using the camera calibration tool to eliminate the distortion of the camera. The results are shown in

World coordinates are calculated in meter units, and the unit of pixel coordinates is pixel. The image taken by the user is shown in

Device | Camera intrinsic parameter | |||
---|---|---|---|---|

SM-N920S | Focal length x | Focal length y | Center of the image x | Center of the image y |

2518 | 2513 | 1632 | 918 |

Points | 1 | 2 | 3 | 4 | 5 | 6 | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

3D world coordinate | X | Y | Z | X | Y | Z | X | Y | Z | X | Y | Z | X | Y | Z | X | Y | Z | ||||||

2.8 | 1.5 | 7.6 | 4.2 | 1.5 | 7.6 | 2.92 | 0.7 | 5.1 | 3.52 | 0.7 | 5.1 | 2.92 | 0.7 | 2.1 | 3.52 | 0.7 | 2.1 | |||||||

2D pixel coordinate | u | v | u | v | u | v | u | v | u | v | u | v | ||||||||||||

1555 | 880 | 2007 | 881 | 1743 | 1236 | 2030 | 1235 | 1924 | 1781 | 2652 | 1780 | |||||||||||||

Points | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

3D world coordinate | X | Y | Z | X | Y | Z | X | Y | Z | X | Y | Z | X | Y | Z | X | Y | Z | X | Y | Z | X | Y | Z | |||||||||

2.0 | 0.7 | 4.94 | 2.0 | 0.7 | 4.94 | 2.8 | 1.5 | 7.6 | 4.2 | 1.5 | 7.6 | 2.92 | 0.7 | 5.1 | 3.52 | 0.7 | 5.1 | 2.92 | 0.7 | 2.1 | 3.52 | 0.7 | 2.1 | ||||||||||

2D pixel coordinate | u | v | u | v | u | v | u | v | u | v | u | v | u | v | u | v | |||||||||||||||||

1138 | 1247 | 1214 | 1189 | 1555 | 880 | 2007 | 881 | 1743 | 1236 | 2030 | 1235 | 1924 | 1781 | 2652 | 1780 | ||||||||||||||||||

In order to compare the accuracy of the proposed experimental results, we extracted the homography matrix of the reference and user images and found the position of the user. The result of extracting feature points using SURF algorithm in both images is shown in

If the error is calculated by the general error rate calculation method, as the position of the camera is located closer to the origin of the world coordinate, it can be confirmed that even if the difference between the calculated user position and the actual user position is small, the error rate is large. Therefore, in this paper, the error rate as Equation (7) is newly defined to calculate a reasonable error.

error rate = user location − calculated user location Diagonal length of indoor space × 100 (7)

Since the Inno gallery of Kyung hee University has a width of 7.7 m, a length of 7.2 m and a height of 2.8 m, the value to be deducted from the denominator of the error rate is diagonallengthofinnogallary = 7.7 2 + 7.2 2 + 2.8 2 3 = 4.9182 m .

The method proposed in this paper can find the user’s position with high accuracy by knowing more than 6 pairs of 3D coordinates of Indoor map and 2D image pixel coordinates. The proposed method provides users with a wide range of AR services because it can find the position of the user with high accuracy without the cost of constructing additional infrastructure than the method of using the hardware infrastructure studied by many papers to obtain the indoor user position. It is expected to give high satisfaction. In the proposed method, it is shown that the error rate is 4.705% p lower than the x, y and z coordinates when the user position is obtained by using 6 points. In addition, the user’s posi-

6 points | 8 points | |||||
---|---|---|---|---|---|---|

X | Y | Z | X | Y | Z | |

Real View point | 3.02 | 1.47 | 0.1 | 3.02 | 1.47 | 0.1 |

Proposed method | 3.0163 | 1.4939 | 0.2735 | 3.0314 | 1.2935 | 0.4946 |

Proposed method error rate | 0.075% | 0.56% | 3.52% | 0.23% | 3.58% | 8.02% |

Opencv Decompose | 3.776 | 1.369 | 0.031 | 3.776 | 1.369 | 0.031 |

Opencv Decompose error rate | 15.37% | 1.5% | 1.4% | 15.37% | 1.5% | 1.4% |

tion should be more accurate as the number of input data is larger when there is no error in the input data value when the user position is obtained by using 8 points. In this experiment, since there is an error in the input data, it has grown. However, its average is 2.14% p lower than when calculating the position of the user by estimating the homography.

In the future, the application of the AR technology to the smart phone will increase, and if the technology provided by the application needs the accurate location information of the user, the proposed method is expected to be needed. In this paper, further research is needed on how to reduce errors that occur when input data is not accurate.

This research was supported by “Software Convergence Cluster Program”, through the Ministry of science, ICT and Future Planning/Gyenggi Province (20171218).

Han, S.W., Lee, Y.J., Yun, J.H., Han, C.Y., Lee, D.H. and Suh, D.Y. (2018) Perspective Projection Algorithm Enabling Mobile Device’s Indoor Positioning. Journal of Computer and Communications, 6, 159-170. https://doi.org/10.4236/jcc.2018.61017