^{1}

^{*}

^{1}

^{*}

This paper presents a robust image feature that can be used to automatically establish match correspondences between aerial images of suburban areas with large view variations. Unlike most commonly used invariant image features, this feature is view variant. The geometrical structure of the feature allows predicting its visual appearance according to the observer’s view. This feature is named 2EC (2 Edges and a Corner) as it utilizes two line segments or edges and their intersection or corner. These lines are constrained to correspond to the boundaries of rooftops. The description of each feature includes the two edges’ length, their intersection, orientation, and the image patch surrounded by a parallelogram that is constructed with the two edges. Potential match candidates are obtained by comparing features, while accounting for the geometrical changes that are expected due to large view variation. Once the putative matches are obtained, the outliers are filtered out using a projective matrix optimization method. Based on the results of the optimization process, a second round of matching is conducted within a more confined search space that leads to a more accurate match establishment. We demonstrate how establishing match correspondences using these features lead to computing more accurate camera parameters and fundamental matrix and therefore more accurate image registration and 3D reconstruction.

Establishing match correspondences between two or more images are one of the most common tasks in applications of image processing in computer vision. This is a challenging task especially under conditions such as oblique view, illumination change, and large view variation where the viewing camera undergoes a large transformation. Most cutting-edge matching approaches extract affine-invariant regions and assess their similarities using intensity-based image properties. These methods perform well for images with shorter baselines or small affine transformations. However, they easily fail when it comes to images of wider baselines or those with larger view variation. When establishing match correspondences in aerial images of suburban regions, the difficulties of the matching lie in several factors. Firstly, these images could be visually significantly dissimilar if the viewing direction or angle (pitch angle in specific) is not at nadir. In such situations, the projective transformations between the two views cannot be simply ignored. Secondly, due to changes in the background (originating from perspective projection) identical physical regions of the scene, at best, contain only partial similarities. Finally, the existence of many repetitive shapes with similar sizes, colors and, patterns in the construction of suburban areas (originating from similar design plans and identical construction materials) makes the identification based on simple features an erroneous one. Therefore, recognition of identical features cannot be accurately and robustly achieved via traditional matching techniques [

This paper presents a novel approach to creating unique features in complex suburban environments that can be identified even when they undergo large viewpoint variation. The paper is structured in the following order. Related works are reviewed next followed by the contributions of the proposed approach. Section 2 describes the characteristics of the input images and their accompanying auxiliary data. Section 3 introduces the proposed 2EC features and describes how they are constructed. Section 4 presents our proposed method for establishing 2EC match correspondences in two or more images. Experimental results and comparison with the state of the art are provided in Section 5. Finally, Section 6 summarizes the conclusions of this work.

During several last decades, the problem of wide baseline matching has been addressed in numerous ways. A wide variety of approaches involve detection of distinct features and viewpoint invariant descriptors. Discrete points are among the most popular features used in the applications of computer vision. Lowe [

One common trait of all the above features is the drop in the features’ repeatability as the viewpoint differences increase. Moreover, all of these methods assume that the local areas associated with features are planar. This, however, is not a valid assumption when working with aerial imagery of suburban regions. These regions include buildings at variable heights. Naturally these methods have less success in handling the registration of aerial imageries under larger view variation.

Corners, as rotationally and scale invariant features, have been used for image registrations for many years. To add to the consistency of the features’ locations, some methods have suggested incorporating edges into the detection procedure. Xiao et al. [

Beside points and their invariant regions, lines are also commonly used as features when it comes to image registration. Schmid et al. [

There are also some works that utilize combination of interest points and lines or edges. Tell et al. [

Some methods take advantage of the characteristics of man-made objects, such as straight lines and planar surfaces. In order to establish correspondences between images of building facades, Lee [

In this paper, we introduce a novel feature and a unique matching technique for automatic establishing match correspondences between two oblique aerial images of the suburban regions under a large projective transformation. The proposed feature utilizes two straight lines and their intersections, which could correspond to a vertex and two connected edges of a building’s rooftop. We refer to our proposed features as 2 Edges and a Corner or 2EC features. Similar to [

The main contributions of this paper are as following:

1) We propose 2EC features that hold geometrical traits of straight edges and their intersections with each other and are especially designed to represent typical configurations in man-made structures. Since 2EC features hold geometrical relationships between structures’ lines and corners, they can be tracked and associated even when they are viewed from a completely different viewing direction.

2) A new matching algorithm is developed that utilizes both visual and geometrical properties in a hierarchical approach to robustly and accurately 2EC match correspondences between two or more images of a scene.

The oblique aerial images from Pictometry International Corp. dataset are used as the input of the proposed algorithm. All the aerial images in our dataset are 2672 × 4008 pixels covering a ground area of 420 × 640 m^{2} with a sample distance of 0.15 meter/pixel. These images are captured at a slant angle from an elevated position. The image acquisition system includes four cameras, each obliquely downward pointing to North, West, South and East. The input images are captured via an airplane flying over the region of interest in a zigzag way. Considering the way these images are acquired, usually several images cover each particular scene. These images are taken at an average of 50˚ pitch angle, and therefore any two images of the same scene, captured by different cameras, undergo large projective variation. An example of an oblique aerial image pair, taken under such condition, is shown in

A file called metadata is accompanied with each image that includes the flight information, the internal and external parameters of the camera at the time of image capture. A sample metadata file is presented in

our proposed algorithm, the camera parameters can be refined.

The main idea behind 2EC features is to utilize the geometrical traits of straight edges on the man-made structures and provide distinctive features that can be robustly matched between input images with large projective transformations. Here we introduce 2 Edges and one Corner (2EC) features, which consist of two connected line segments and their intersection. 2EC features could correspond to the boundaries of building rooftops. The procedures for detecting these features are explained in the order that they are implemented.

1) Straight line extraction: Here a set of straight-line segments are extracted from the image. The Burns line detector [

a) Partition image pixels into bins based on the gradient orientation values. A bin size of 45˚ was selected.

b) Run a connected-components algorithm to form line support regions from groups of 4-connected pixels that share the same shifted gradient orientation bin.

c) Eliminate line support regions that have an area smaller than 3 pixels.

d) Repeat steps a, b, and c by spatially shifting the gradient bins to produce a second set of line support regions. This accounts for the possibility that some true lines may have component pixels that lie on either side of an arbitrary gradient orientation boundary.

e) Use the voting scheme in [

f) For each line support region, compute the line represented by that region by performing a least square fit. The least-squares fit estimates planar model of each line using the gradient magnitude values.

2) Line linking: The objective of this step is to link collinear line segments that are separated by very small gaps. Following algorithm describes linking process:

a) Sort the lines in the order they would be encountered if a horizontal sweep was performed across the image.

b) Use a divide-and-conquer method to efficiently determine nearby pairs of lines.

c) Test each pair of nearby lines to determine whether they should be linked. Conditions (a) and (b) and one of the conditions of (c) and (d) in

More details for this process can be found in [

3) Obtaining edge map: Edge map serves as a clue to decide on the extension of a line and selection of its potential intersections. First, the original image is processed via Canny edge detector, followed by dilation with a 3 × 3 pixels square structuring element to widen edges and fill out narrow gaps. Next, isolated blobs with areas

smaller than 1 m^{2} are removed. In aerial images, smaller blobs usually correspond to protuberant objects or texture on the surface of rooftops and they could be misleading when extending lines. Thus, they should be removed before the succeeding steps.

4) 2EC Feature extraction: Once the edge map is obtained, 2EC features are extracted by the following 6 steps:

a) Line extension: The objective of this step is to adjust the detected line segments according to the image’s edge map such that most of the corresponding edges of rooftops are covered, while as many as possible non- significant lines (lines on the grass regions or grounds) are removed. The extension of a line, on the other hand, should not be too long as it could incorrectly represent edges of multiple buildings along the same street/block and at a close proximity of each other. Only with meaningful and true line segments, the 2EC features can be constructed robustly. According to this principle, the line extension procedure has the following three steps:

· From the set of detected lines, the lines with less than 80% overlap with the edge map are identified and removed.

· To cover for those cases where partial occlusion or low contrast between the rooftop and the background have created breakage in the line segments, where they indeed correspond to complete building rooftop edges, each line segment is extended on both ends until it no longer overlaps with the edge map. The image boundaries are used in the cases where the extension of the line falls outside of the image area.

· Once all lines are extended, the endpoints of each extended line must be in close proximity of rooftop vertices, if it is from a real rooftop. However, in some cases, small discontinuities exist between the lines endpoints and the vertices that correspond to the rooftop corners. Those corners are supposed to be reached by further continuation of the lines. As a result, the extension of each line is continued by 10% of its length.

b) Line merge: In aerial images of urban scene, a rooftop edge often consists of several parallel lines, with each of them representing some parts of the edge. In order to avoid such ambiguity, nearby parallel lines are fused together to form one single continuous line that uniquely corresponds to the full length of a rooftop edge. If all the following three criteria are satisfied, the two line segments are assumed to be from one physical edge and therefore are merged together.

· They are parallel or almost parallel (a maximum cross angle of 5˚ is tolerated).

· The lateral distance between two lines is less than 1.5 pixels or 22.5 cm.

· When one line is projected onto the other, there exists at least at least 1 pixel overlap.

c) Line intersection: Using all the remaining line segments after the extension and merging procedures, the intersections of each line with all the other lines are found. An intersection is recorded only if it satisfies the following two conditions:

· The two intersected lines have an angle larger than 20˚ and less than 160˚.

· The intersection point lands on the edge map.

d) Endpoints redefinition: The definition of a 2EC feature includes two rooftop contour line segments and their intersection, which is obtained using previous steps. Each line segment starts from the intersection and ends at one of the endpoints of a line. However, the endpoints of each line might not be stable or might not precisely occur at its right place. Therefore, it is necessary to redefine the endpoints for each line. For this purpose, for each line, we detect all Harris corners [

e) Removal of unstable lines: Here, all the unstable lines are identified and removed. Based on our experiments and observations, we classify three conditions as unstable. Lines with at least one of these three conditions are labeled as unstable and removed iteratively until no more lines can be removed.

· Lines with only one intersection and missing at least one endpoint.

· Individual short lines (5 pixels of 0.75 cm or shorter).

· Short lines connected together forming a chain (if all connected lines are shorter than 5 pixels/0.75 cm).

f) 2EC feature extraction: At each intersection of a line, one or more features can be constructed. As shown in _{1} and l_{2}. G_{13}, G_{12}, G_{11}, and G_{21}, G_{22} are all the other intersections and endpoints on l_{1} and l_{2}. For G, there are 6 possible combinations of two line segments. In each combination, if both segments are longer than 15 pixels, a feature will be generated. In _{13} is shorter than 15 pixels, it cannot be used to generate any feature. As a result, only four 2EC features are generated, which share the same intersection point G, but with different line segments combinations. These features are: G_{11}GG_{21}, G_{11}GG_{22}, G_{12}GG_{21} and G_{12}GG_{22}.

For each 2EC feature, we refer to the intersection as the center point, and the two endpoints of the lines as end points. As suggested by the name, the center point of 2EC feature could represent the corner of a rooftop where two rooftop boundary lines intersect. Considering the way 2EC features are extracted, more than one 2EC features may share a common center point, while they have line segments with different lengths. Since the center points are the most stable components of 2EC features (invariant to rotation and scale change), they are used as the point correspondences after 2EC feature correspondences are established.

In this section, we present a matching process for matching 2EC features between images with wider view differences. Two assumptions are made here:

1) Each feature in one image can match only with one feature in the other image.

2) Not every feature has a correspondence.

For each feature in the first image, a search space is created by predicting the extracted location of that feature in the other image using the acquisition parameters in the metadata files. Both features’ shapes and image content are used to identify correspondences within the search space. Once the potential matches are obtained, a projective matrix optimization method is used to remove outliers. The matching process is repeated in an iterative way within smaller search spaces, which is computed based on the result of the first round of the projective matrix optimization. Following steps describe in details the proposed matching algorithm:

In this work, with each image a metadata file is associated that provides additional information including camera orientation (Yaw, Pitch and Roll angles) and location (latitude, longitude, and altitude), camera internal parameters, etc. Although these parameters are imprecise, they can still be used to roughly predict the shape and the position of each 2EC feature from one image to another.

1) Feature location estimation using metadata: Let us consider a pair of images taken by two cameras (C_{1} and C_{2}) from different viewpoints. m is a point in I_{1}, and m' is its actual corresponding location in I_{2}. m can be transformed from I_{1} to I_{2} with a two-step transformation, which is computed using the imprecise camera parameters in the metadata files. The transformed location can then be used as a rough estimation of the location of m'.

Two assumptions are made in here. First, since the altitude of the camera (typically 1 km) is much higher than the heights of scene buildings, we assume that the building rooftops are located at the surface of the earth. Therefore, the depth of the image is assumed as the altitude of the camera, and the 3D location of any point in the image can be determined with a single image and its camera parameters. Second, since the ground region

covered by two cameras is small compared to the entire surface of the earth, we consider the ground as flat.

Based on these two assumptions, the model for two cameras and the ground region covered by their images are illustrated in _{1} is projected onto the 3D world (M), next M is projected back into 2D on I_{2}. In order to implement such transformation, a world coordinate is used as the reference coordinate. Here, the center of the world coordinate system is located at the center of camera 1_{1} to the 3D world includes two procedures:

a) Transformation of the image coordinates of

The values of yaw_{1}, pitch_{1}, and roll_{1} represent the orientation of camera 1. _{1}. d_{x}_{1} and d_{y}_{1} represent the conversion between the image pixel and metric world. f_{1} is the focal length of the first camera. 3D rotation

b) Projection of

Here alt_{1} is the altitude of camera 1 from metadata.

Next, according to [_{2} using the camera matrix P_{2}, which can be computed using the camera parameters in the metadata of I_{2}:

In the above equations, K_{2} is the calibration matrix computed using the internal camera parameters:

R_{2} is the orientation of the camera 2 with respect to the world coordinate system centered at C_{1}. R_{2} can be computed by:

where lat_{1}, lon_{1}, lat_{2} and lon_{2} are the latitudes and longitudes of the two cameras. t_{2} is the coordinate of the second camera, C_{2}, in the world coordinate system. It can be computed by:

C_{cam1} and C_{cam2} represent the 3D Cartesian coordinates of the two cameras with respect to ECEF (Earth-Centered, Earth-Fixed) coordinate system, and they are computed using the longitude and latitude of the two cameras according to [

The above algorithm can be applied on each one of the 2EC features in I_{1} to predict its location and shape in I_{2}. In fact, since we consider the ground as a plane, the corresponding points in the two image planes are related by a projective matrix H_{m} that can be found using the following steps:

· Four points are selected in_{1}.

· Transform these four points to I_{2} using the method explained above. Let us assume the transformed points as

· With four corresponding points_{m} is computed a least squared algorithm, which is explained in [

During the matching process, the projective transformation H_{m} is applied to all 2EC features in I_{1} to predict their locations and shapes in I_{2}.

2) Evaluation of prediction accuracy: To investigate the accuracy of prediction and provide a basis for selecting some of the thresholds required in the matching process, we applied the above transformation on 10 pairs of images. Among them, 5 pairs had 90˚ difference in yaw, and the other 5 pairs had 180˚ difference. For each pair of images, 120 matching points, which were uniformly spread over the entire images, were selected manually. Points in I_{1} were then transformed to I_{2} using the method explained in the above section and four measurements were calculated: 1) The distances between the transformed points and their ground truth locations. 2) For each point in I_{1}, its epipolar line was computed according to [_{2} to their corresponding epipolar lines were calculated. 3) The lengths of the line segments, which were generated byany combination of two transformed points were computed and compared with that of the corresponding ground truth features. The length difference was measured as a percentage of the ground truth length. 4) The cross angles between the transformed line segments and their ground truth.

The statistical values for the above measures are provided in Figures 9-12. One can see that the predicted positions have a maximum offset of 300 pixels from the ground truth locations. Also, the predicted epipolar lines are less than 150 pixels away from their true locations. Moreover, about 96.58% of the predicted line segments have less than 10% length differences compared to their corresponding ground truth line segments, and about 95.26% of the predicted line segments have less than 5˚ orientation differences. These errors are mainly induced by two sources: 1) the imprecise camera parameters provided in the metadata file, and 2) the assumption that all buildings in the images have zero heights. According to the epipolar geometry, each point in I_{1} can be mapped onto an epipolar line in I_{2}. The location of the mapped point on the epipolar line, however, depends on the height of this point from the ground. Since the actual heights of the points vary in both images, their predicted locations

in I_{2} would have different offsets from their ground truth locations.

Since the aerial images are large (2672 × 4008 pixels), it is very time consuming to search for correspondences over the entire image. Also, searching in a large domain means a higher chance of wrongly identifying matches. Therefore, creating a smaller search space that contains the correct correspondence is very critical to the process. Let’s P_{abc} represents a 2EC feature in I_{1}. Its center point is denoted by c_{P} and its two end points are denoted by a_{P} and b_{P}. The following two steps are designed to create its corresponding search space in I_{2}:

1) Transform the center point c_{P} to I_{2} using the method described in Section 4.1. Let’s denote the transformed point by_{square} is selected to be 500 pixels. This value is chosen based on the test results of the prediction accuracy in Section 4.1.

2) Compute the epipolar line of c_{P} using the estimated fundamental matrix [

As shown in _{abc} (the gray area). All the features in I_{2} with their center points in _{abc}.

Once the search space _{abc} has been determined, an exhaustive search among all the features inside

1) Transform P_{abc} to _{2} using the method explained in Section 4.1. The transformed center point is denoted by c0P and two end points are denoted by

2) For the transformed feature

scriptor

3) Given two 2EC features with their descriptors of

The thresholds

After the first round of filtering, for a typical feature in I_{1}, there may be one or more qualified candidate features in I_{2}. In order to identify the feature with the highest similarity, a correlation based method is incorporated. Since each 2EC feature, composed of three critical points (one center point and two end points), an image patch can be constructed by adding a fourth point to form a parallelogram with the other three points. The similarity between 2EC features can be evaluated by computing the correlation of their corresponding image patches. There are two reasons that make the use of correlation possible. 1) The images are taken by cameras from relatively high altitude, so rooftops can be seen in both images, even though they do go into projection and deformation. Since 2EC features represent the rooftop profiles of the buildings, they can be extracted in both images. 2) Since the rough camera parameters are provided via the metadata file, the image patches created by 2EC features can be transformed from I_{1} to I_{2}.

Instead of using pixel intensity in the input images to compute correlation, we use the gradient images. The advantage of using gradient images is that they are less sensitive to the illumination variation. The Sobel operator is used to approximate the first derivative of the image in both horizontal and vertical directions. Magnitude of the gradients are then obtained by summing _{abc} in I_{1} and Q_{abc} in I_{2} using correlation are measured using following steps:

1) Consider that _{abc}, where K is the number of pixels. Transform the coordinate of each pixel inside the parallelogram to I_{2} using the method detailed in Section 4.1. Let us assume the transformed pixels locations are _{2}, where

2) Compute vector

3) Translate each

4) A similarity value for P_{abc} and Q_{abc} is set by computing the correlation of the two sets of image gradient values:

A similarity measure is assigned to each of the remaining candidate features in _{corr}, the corresponding feature is extracted as a match to P_{abc} Otherwise, there is no match found for P_{abc}. Considering that the prediction procedure in Section 4.1 is not a precise one, due to the imprecise camera parameters in the metadata files, t_{corr} is set to 0.5. Increasing this value would result in higher accuracy of the matching, with smaller number of match correspondences. If two or more 2EC features in I_{1} share the same center point, the one with the highest matching score is kept, while the others are dismissed. Finally, the center points of the 2EC feature correspondences are selected as the initial set of point correspondences.

After the initial set of point correspondences is obtained, a statistical method is used to identify outliers among these point correspondences.

Consider the initial point matches as _{i} in I_{1} is represented by _{i} in I_{2} is represented by_{1} are mapped to I_{2} by_{2} are calculated as:

Next, point pairs with their distances larger than

Once the above four iterations are exhausted, a new round of iteration is initiated. At each iteration, 5% of the matched pairs with the largest distance error are removed iteratively and the process stops if at least one of the following conditions is satisfied: 1) The maximum distance in

At the end of this process, a new projective transformation

In this section, the stability and distinctiveness of 2EC features are investigated. The quality of 2EC features is evaluated by registration of slant aerial image that include wide variations in the viewing angles. The superiority of these features is shown by comparing the registration results for these features and those of the state of the art.

In this figure, 67 and 61 features were detected respectively. Among these features 13 common features exist between the two images.

buildings. Not only did these features occur on the rooftop corners, but also they appeared on the white protuberant objects on the rooftops. Through a manual inspection we found only five common SIFT features were existing between the two views.

Here, we will evaluate the quality of 2EC features through establishing match correspondences in oblique aerial images. Considering the way these images are acquired, for one subject scene, there would be several images taken from North, East, South and West directions. If image pairs are captured from the similar directions, they will have similar yaw angles and therefore they mostly undergo a linear translation and a small projective transformation. For such images, it is very easy to process and establish match correspondences and usually most existing methods work well. However, two oblique aerial images taken from two different directions, for example, North and East, result in large projective transformation. Therefore, we only focus on image pairs with severe yaw angle variations from now on. Fifteen pairs of images were chosen from our database. Among them, five image pairs had a difference of 90˚ in yaw angle, five pairs had a difference of 180˚, and the other five pairs have 270˚ difference. All these images have pitch angles between 40˚ to 50˚. The combination of yaw and pitch angles variation leads to large projective transformations between these images.

For each image pair, 2EC features were first extracted and initial correspondences were established. Then the outliers were removed using optimized projective matrix. Finally, the matching procedure was repeated one more time based on the new search space that was created in each case using the optimized projection matrix. The original size of each image is 2672 × 4008 pixels. In our experiments, in order to reduce the running time and for the convenience of comparison with the state of the art, we have down-sampled the input images by a factor of 2. All the system parameters were kept the same for all the input images.

The results generated for the 15 image pairs are summarized in

This measurement is estimated using the following steps:

1) Compute the fundamental matrix F over the input image pairs using the normalized 8-point algorithm [

2) For each ground truth pair

3) The distance of

Pair No. | Yaw Diff. | No. of Features I_{1} [#] - I_{2} [#] | GT [#] | IC [#] | CC [#] | CR [%] | PMC [#] | PCC [#] | PCR [%] | Err [pixels] |
---|---|---|---|---|---|---|---|---|---|---|

1 | 90˚ | 1879 - 1415 | 140 | 100 | 92 | 92.00 | 90 | 88 | 97.78 | 1.3 |

2 | 90˚ | 2431 - 2047 | 58 | 67 | 40 | 59.7 | 36 | 34 | 94.44 | 6.2 |

3 | 90˚ | 2512 - 1772 | 75 | 67 | 56 | 83.58 | 44 | 43 | 97.73 | 1.2 |

4 | 90˚ | 1186 - 1292 | 45 | 33 | 31 | 93.94 | 32 | 31 | 96.88 | 4.6 |

5 | 90˚ | 2095 - 3084 | 57 | 57 | 47 | 82.46 | 41 | 39 | 95.12 | 1.5 |

6 | 180˚ | 1393 - 1333 | 95 | 48 | 44 | 91.67 | 48 | 48 | 100 | 1.7 |

7 | 180˚ | 1843 - 2503 | 60 | 43 | 33 | 76.74 | 35 | 32 | 91.43 | 2.4 |

8 | 180˚ | 2560 - 3241 | 94 | 69 | 58 | 84.06 | 55 | 54 | 98.18 | 1.9 |

9 | 180˚ | 1874 - 1823 | 59 | 43 | 34 | 79.07 | 37 | 35 | 94.59 | 2.1 |

10 | 180˚ | 738 - 1046 | 51 | 35 | 32 | 91.43 | 33 | 32 | 96.97 | 2.3 |

11 | 270˚ | 2238 - 1871 | 106 | 89 | 75 | 84.27 | 82 | 79 | 96.34 | 2.5 |

12 | 270˚ | 3334 - 2084 | 81 | 45 | 41 | 91.11 | 40 | 39 | 97.50 | 1.5 |

13 | 270˚ | 887 - 932 | 58 | 41 | 37 | 90.24 | 32 | 31 | 96.88 | 2.8 |

14 | 270˚ | 1470 - 1750 | 92 | 70 | 60 | 85.71 | 70 | 67 | 95.71 | 1.5 |

15 | 270˚ | 738 - 1046 | 85 | 76 | 64 | 84.21 | 60 | 58 | 96.67 | 1.3 |

where distance of

As shown in columns PMC, PCC, and PCR of

In order to compare the quality of 2EC features with that of the state of the art, three popular features including SIFT [

Tables 2-4 show the matching results for each of the above three algorithms. To assess the quality of the results, the accuracy of matching was inspected manually. The number of correspondences, the number of correct correspondences, and their ratio are given in the columns of Total Matches, Correct Matches, and Correct Rate. The average residual errors are provided in the last column of Error.

As shown in

Pair No. | Yaw Diff. | Total Matches [#] | Correct Matches [#] | Correct Rate [%] | Error [Pixels] |
---|---|---|---|---|---|

1 | 90˚ | 95 | 87 | 91.58 | 2.7 |

2 | 90˚ | 40 | 30 | 75.00 | 36.7 |

3 | 90˚ | 257 | 250 | 97.28 | 0.5 |

4 | 90˚ | 66 | 59 | 89.4 | 9.2 |

5 | 90˚ | 160 | 152 | 95.00 | 3.5 |

6 | 180˚ | 153 | 149 | 97.39 | 2.6 |

7 | 180˚ | 81 | 73 | 90.12 | 2.5 |

8 | 180˚ | 20 | 6 | 30.00 | 13.2 |

9 | 180˚ | 245 | 223 | 91.02 | 3 |

10 | 180˚ | 50 | 36 | 72.00 | 18.2 |

11 | 270˚ | 43 | 37 | 86.05 | 51.4 |

12 | 270˚ | 20 | 19 | 95 | 4.7 |

13 | 270˚ | 209 | 199 | 95.22 | 6 |

14 | 270˚ | 106 | 100 | 94.34 | 6.4 |

15 | 270˚ | 22 | 21 | 95.45 | 1.6 |

Pair No. | Yaw Diff. | Total Matches [#] | Correct Matches [#] | Correct Rate [%] | Error [Pixels] |
---|---|---|---|---|---|

1 | 90˚ | 51 | 44 | 86.27 | 18.6 |

2 | 90˚ | 85 | 75 | 88.24 | 64 |

3 | 90˚ | 200 | 171 | 85.50 | 4.8 |

4 | 90˚ | 86 | 56 | 67.44 | 14.8 |

5 | 90˚ | 138 | 131 | 94.93 | 2 |

6 | 180˚ | 70 | 65 | 92.86 | 11 |

7 | 180˚ | 57 | 51 | 89.47 | 4.9 |

8 | 180˚ | 12 | 8 | 67 | 26.9 |

9 | 180˚ | 173 | 159 | 91.91 | 2.9 |

10 | 180˚ | 37 | 30 | 77 | 44.6 |

11 | 270˚ | 20 | 6 | 30 | 45.4 |

12 | 270˚ | 15 | 6 | 40 | 47.1 |

13 | 270˚ | 226 | 196 | 86.73 | 10.9 |

14 | 270˚ | 47 | 41 | 87.23 | 6.6 |

15 | 270˚ | 20 | 9 | 45 | 38.2 |

Pair No. | Yaw Diff. | Total Matches [#] | Correct Matches [#] | Correct Rate [%] | Error [Pixels] |
---|---|---|---|---|---|

1 | 90˚ | 403 | 390 | 96.77 | 1.5 |

2 | 90˚ | 491 | 403 | 82.08 | 13.3 |

3 | 90˚ | 1366 | 1331 | 97.44 | 1.0 |

4 | 90˚ | 251 | 147 | 58.57 | 29.4 |

5 | 90˚ | 979 | 919 | 93.87 | 1.6 |

6 | 180˚ | 560 | 480 | 85.71 | 19.2 |

7 | 180˚ | 213 | 190 | 89.20 | 4.1 |

8 | 180˚ | 0 | n/a | n/a | n/a |

9 | 180˚ | 1023 | 924 | 90.32 | 2.5 |

10 | 180˚ | 149 | 125 | 84 | 9.8 |

11 | 270˚ | 53 | 0 | 0 | n/a |

12 | 270˚ | 32 | 18 | 56 | 730 |

13 | 270˚ | 999 | 964 | 96.50 | 0.9 |

14 | 270˚ | 214 | 195 | 91.12 | 3.5 |

15 | 270˚ | 52 | 0 | 0 | n/a |

As shown in

As for ASIFT feature, it has an average correct rate of 85.13%, with an average residual error of 68.07 pixels. It, however, outperforms 2EC feature in two cases (rows 3 and 13) with less residual error and similar correct rates. This is due to the fact that in those two cases the input scene included abundant texture on the ground and some rooftops and therefore correspondences were easily identified using ASIFT features. Also, the match correspondences for these two cases were spread all over images and as a result ASIFT resulted in a fundamental matrix with higher accuracy and less residual error.

In this work, 2EC features were proposed for the purpose of establishing match correspondences between oblique aerial images with large projective transformations. A complete solution for generating and matching 2EC features was proposed. The established correspondences could be used to accurately compute the fundamental matrix and obtain refined camera parameters that are used for image registration or 3D reconstruction purposes. The extraction of 2EC features required detecting straight lines in the input images and these lines were utilized as a medium to extract viewpoint invariant corners. Both lines and corners were encapsulated in the definition of 2EC features, such that each feature could potentially correspond to a vertex and two connected edges of a building rooftop. The geometrical characteristics of 2EC features ensured their surrounded image regions to be planar, and therefore viewpoint invariant under large viewpoint variations. In the matching process, both geometrical and visual properties of each feature were used. The experimental results showed the effectiveness of 2EC features to identify and locate true match correspondences between oblique aerial images under large projective transformation. The superiority of 2EC features was demonstrated by comparing it with the state of the art features.