Journal of Software Engineering and Applications, 2013, 6, 511-518
http://dx.doi.org/10.4236/jsea.2013.69061 Published Online September 2013 (http://www.scirp.org/journal/jsea) 511
Object Detection Using SURF and Superpixels*
Miriam Lopez-de-la-Calleja1, Takayuki Nagai2, Muhammad Attamimi2, Mariko Nakano-Miyatake1,
Hector Perez-Meana1
1ESIME Culhuacan, Instituto Politecnico Nacional, Mexico City, Mexico; 2University of Electro-Communic ations, Tokyo, Japan.
Email: hmperezm@ipn.mx
Received July 19th, 2013; revised August 20th, 2013; accepted August 28th, 2013
Copyright © 2013 Miriam Lopez-de-la-Calleja et al. This is an open access article distributed under the Creative Commons Attribu-
tion License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
ABSTRACT
This paper proposes a novel object detection method in which a set of local features inside the superpixels are extracted
from the image under analysis acquired by a 3D visual sensor. To increase the segmentation accuracy, the proposed
method firstly performs the segmentation of the image, under analysis, using the Simple Linear Iterative Clustering
(SLIC) superpixels method. Next the key points inside each superpixel are estimated using the Speed-Up Robust Fea-
ture (SURF). These key points are then used to carry out the matching task for every detected keypoints of a scene in-
side the estimated superpixels. In add ition, a probability map is introduced to describe the accu racy of the object detec-
tion results. Experimental results show that the proposed approach provides fairly good object detection and confirms
the superior performance of proposed scene compared with other recently proposed methods such as the scheme pro-
posed by Mae et al.
Keywords: Object Detection; SURF; SLIC Superpixels; Keypoints Detection; Local Features; Voting
1. Introduction
In the area of intelligent systems, the autonomous mobile
robots are expected to have the ability to recognize their
surrounding environment in real time. Object detection,
which is a task for searching and localizing a target in a
particular scene, can be considered as prime feature for
autonomy. This fact has stimulated the research in this
field and as a result several algorithms have been pro-
posed during the last several years. Lai et al. [1], Ozuysal,
et al. [2], Harzallah, et al. [3], Dalal and Triggs [4] pro-
posed to use the standard sliding window approach in
which the system evaluates a score function for all posi-
tions and scales in an image; and sets limits to the scores
to obtain bounding boxes for each instance. Each detec-
tor window has a fixed size and search across 20 scales
on an image in a pyramidal form. For efficiency a linear
score function is considered. The performance of the
classifier heavily depends on the data and also the fea-
tures used for the object detection [1]. Another popular
approach is to extract local interest points from the image
and then to classify each of the regions around these
points, rather than looking at all possible sub windows
[5-7]. A weakness shared by all of the above approaches
is that they can fail when local image information is in-
sufficient, that is, if the target is very small or highly oc-
cluded. To reduce these problems, Mae et al. [8] in-
cluded a local feature matching algorithm using local
geometric consistency for object detection. When it is
online, the system uses SIFT for scene feature extraction
and compares them with those of the reference image.
This research is suited for objects that have texture, and
performs better when the objects have flat surface or
when they are observed from the same view angle. The
advantage of this approach is the simplicity of the im-
plementation and portability for various robot control
systems, minimal knowledge for the target pattern and
fairly good performance. The main disadvantages are
that it is limited to the patterns with texture and the flat
surface assumption for pose estimation. As a result, the
matching could worsen if the object has non-planar sur-
face and if it is observed from a different view-poi nt.
Other proposals are based on the appearance [9-15],
which are offline methods based on a collection of small
patches. These approaches provide good detection rates
although its computational complexity is large, requiring
in general long processing time to generate the model of
*Object detection for robotic vision.
Copyright © 2013 SciRes. JSEA
Object Detection Using SURF and Superpixels
512
each object [15].
Thus the current progress in object detection still re-
quires further research to achieve efficiency close to
100% in real time. To contribute to the improvement of
some issues in object detection, this paper proposes an
object detection algorithm based on super pixel together
with the Speed-Up Robust Feature (SURF) as a feature
extraction method to perform the matching task. Evalua-
tion results show that despite a clu ttered background and
occlusion, the proposed algorithm is able to detect the
specific object among several other similar looking ones.
This property makes the proposed algorithm suitable for
using on robotic platforms which may operate in natural
sceneries.
The rest of this paper is organized as follows: Section
2 provides a detailed description of proposed method for
object detection. Section 3 provides the experimental
results and discussion together with a comparison of the
performance of the proposed method with other recently
proposed algorithms. Finally, the main conclusions are
presented in Section 4.
2. Proposed Method
The proposed method is based on the use of SLIC super
pixel [6] and SURF [7], together with a voting process
and the probabilit y map, which is introduced in this work
in order to improve the accuracy of object detection.
Figure 1 shows the block diagram of the proposed me-
thod. Here given an input image color and the Time of
Flight (TOF) data, is segmented using SLIC super pixel
[16,17]. Next several key points are extracted and labeled
as a feature, for matching, using SURF [18]. Then the
extracted key points of the input image are compared
against the learned key points stored in a database. Next,
a voting, which is similar to a histogram of Ids, is calcu-
lated for the input key points and then the final Id is de-
termined as the greatest number in th is step. Then we use
a probability map, which is generated for each Id, to in-
crease the accuracy of the position estimated of a given
object in the scene. Finally using these Ids, the desired
Figure 1. Proposed method.
object is detected in the scene. Next sub-sections provid e
a detailed d e scription of eac h stage of propos ed system.
2.1. Database
A 3D visual sensor [16] which consists of a TOF and two
CCD cameras is used to ca pture color a nd 3D inform ation
to construct a database. To obtain the visual information, a
small handheld observation table with an XBee wireless
controller is installed on a robo t that enables th e observa-
tion of the object from various viewpoints. Here, 10 ob-
jects are used and 40 different views of each object are
captured. Considering the computational cost, the SURF
algorithm [17] is used to collect a set of 128 dimensional
descriptors from each captured image which is stored in
the database.
2.2. Input Image
The proposed object detection method uses visual sensor,
shown in Figure 2, which acquires color information in
real time by calibrating the TOF (Time of Flight) and two
CCD cameras [16].
2.3. Segmentation Process
Superpixels has been applied in several computer vision
applications such as depth estimation [17], image seg-
mentation [18,19] and object localization [20], etc. Be-
cause in most of these applications superpixels have per-
formed fairly well, several approaches to calculate such
superpixels have been proposed in the last decade
[21-26]. Among them a suitable approach is the so called
Simple Linear Iterative Clustering (SLIC) [27] because it
is faster to compute, achieve high segmentation quality
and provides accurate segmentations.
Assume that an N-pixels image is divided in K non-
overlapped sub-blocks of size pixels, where S =
(N/K)1/2, whose center is given by (xi, yi). To avoid the
superpi xel center bei ng locate d on an edge or a noisy pi xel,
it is estimated as the point with the smallest grad ient in a
window of 3 × 3 pixels around the center of sub-block
SS
Figure 2. Visual sensor used for object detection.
Copyright © 2013 SciRes. JSEA
Object Detection Using SURF and Superpixels 513
under analysis [27]. After the center of i-th superpixel is
obtained, t he center of the i-th cluster center i s determined
as follows:
ˆ
,,,,
iiiii
CLabxyi
, I = 1, 2,..,K, Li (1)
where
ˆˆ
,
ii
x
yis the center of i-th-cluster, Li is its light-
ness, ai its redness-greenness, bi its yellowness-blueness.
Once the K initial centers are determined, each pixel in a
neighborhood of 2S × 2S pixels is associated with the
superpixel, in such neighborhood, whose distance, D, be
minimum, where

2
22
d
ds
Dc m
S



 , (2)

22
jiji ji
dcLLaabb
2
, (3)

2
ˆˆ
ji ji
dsx xyy
2
, (4)
SNK, (5)
and is a constant that controls the importance
of color similarity and spatial distance. Thus when m is
large the spatial distance is more relevant and the resulting
superpixels are more compact, while when m is small the
color becomes more important and the superpixels be-
comes more irregular in size and shape, approaching more
closely to the image boundaries. Finally, after all pixels
have been associated the closest superpixel, a new center
Ci, I = 1, 2,,K, is estimated by averaging all p ixels be-
longing to the i-th superpixel.
1m40
Proposed algorithm assumes K = 200 and to control the
compactness of a superpixel we select m = 10 which
provides a good ba lance between the color similarity and
spatial proximity.
2.4. Feature Extraction
The Speeded-Up Robust Features (SURF) [28], which is
a scale and rotation-invariant detector and descriptor, is
used for feature extractions. The main task of SURF is
finding point correspondences between two images of the
same object. The structure of the SURF algorithm is di-
vided in three steps: 1) Interest key point detection; 2)
estimation of a feature vector, called descriptor; 3)
matching between images. The interest point detectio n is
employed to find relevant points in one image or object,
in order to allocate valuable information that is going to
be computed by a local descriptor, by means of the Hes-
sian-Laplace matrix detectors. The Estimation of a fea-
ture vector describes the relevant regions within the in-
terest point neighborhood. It has to be distinctive and at
the same time robust to noise, detection, geometric and
photometric displacement deformations. Finally in most
situations the scenery has many key points that must be
identified with labels, which can be ach ieved us ing local-
ity-sensitive hashing (LSH) [29,30]. It is an indexing
scheme for performing approximate search in high di-
mensional environments by enumerating all nearest
neighbors and choosing the nearest point. To filter the
matching results, the Euclidean distance between the
matched descriptor and the most similar one is firstly
calculated. The descriptors with low distances comparing
a predetermined threshold are used. Finally, the best k
results that satisfy the threshold will be used for voting
step described i n the next step .
2.5. Voting
In the voting process, the key points estimated by the
feature extraction stage that are inside a given boundary
are considered in the voting process. A voting technique is
applied to estimate the object inside the superpixel and
then to obtain the Id to capture the object. Let N be the
number of objects in the database; V(i,j) is the number of
matched key points of the object j inside the i-th super-
pixel and V(i,N + 1), is the number of unmatched key
points inside the i-th superpixel. Then, the resulting pa-
rameter
max 1
I
didN is determined by the maxi-
mum vote number. This Id is used to segment the super-
pixels that results of the whole segmentation of a given
input image.
2.6. Probability Map
The probability map is used to determine the probability
of the Id in each segmented part of the image. The proc-
ess of estimating the probability map consists of finding
the occurrence rate of each Id in the image, such that a
particular Id can be selected according to the occurrence
rate to that Id. The probability of object j at pixel (a, b)
inside the i-th superpixel is then given by



1
1
,
,,
mN
k
Vij
Pabj Vik
(6)
where
,,
m
P
abj is defined as the probability map that
represents the accuracy of object detection at pixel (a, b),
V(i, j) is the vote number of matched keypoin ts of object
j inside the i-th superpixel. The Probability map is esti-
mated for all Id’s in order to determine the probability of
the detected objects. This method, as well as the voting
helps to increase the accuracy of the object detection
algorithm.
3. Experimental Results
This section presents the results of the experimental
evaluation of proposed system. The experiment is carry
out in a room using 10 different objects as shown in the
Copyright © 2013 SciRes. JSEA
Object Detection Using SURF and Superpixels
514
Figure 3(a). The user shows each object in various an-
gles to the robot to build the database, as shown in Fig-
ure 3(b), which acquires features vectors of 40 consecu-
tive frames for each object that are generated by the ro-
bot in the learning phase. (a) (b) (c)
3.1. Detection Performance
To evaluate de object detection capability of proposed
system, 10 different experimental setups are constructed
as shown in Tables 1 and 2. Some of the experimental
processes used for obtaining the object detection per-
formance of proposed scheme are shown in Figures 4-7. (d) (e) (f)
Figure 4(a ) shows t he setup 1, where we have 3 objects,
two of them belonging to the database, which means that
the robot will only have to detect two of them . Figure 4(b)
shows the superpixel estimation used for segmenting the
desired region; Figures 4(c) and (d) give the desired
probability map for orange juice bottle and green tea
carton box. Finally, Figure 4(e) shows the detected regio n
both objects, as it was expected.
Figure 5. Experimental setup 2: (a) Input Image; (b) Su-
perpixels evaluation; (c) P robabilit y ma p o f g r e en t ea c arto n
box; (d) Probability map of milk tea carton box; (e) Prob-
ability map of potatoes red carton box; (f) Detected object.
Figure 5( a) shows the setup 2 in whic h we have several
objects, 3 of them belonging to the database, which m eans
that the robot has to detect the 3 objects belonging to the (a) (b) (c)
(d) (e) (f)
Figure 6. Experime ntal setup 8: (a) Input image; (b) Super-
pixel evaluation results; (c) Probability map of milk tea
cardbox; (d) Probability map of yellow potatoes package; (e)
Probability map of red potatoes package; (f) Detected ob-
jects.
(a) (b)
Figure 3. Learning phase, (a) Objects used for training; (b)
A people showing the objects to the robot from different
angles.
Table 1. Experimental setups used for system evaluation.
SetupObject 1 Object 2 Object 3
1 Green tea carton boxOrange tea bottle
2 Grape tea car ton boxPotatoes red box Milk tea carton box
3 Green tea carton boxPotatoes yellow box Pringles
4 Chipstar
Seafood noodles
plastic box Pringles
5 Pringles
Seafood noodles
plastic box Chipstar
6 Grape tea carton boxGreen tea carton box Orange tea bottle
7 Potatoes yellow boxGreen tea carton box Potatoes red box
8 Potatoes yellow boxMilk tea carto n box Potatoes red box
9 Grape tea carton boxOrange tea bottle Chipstar
10Orange tea bottle Green tea carton box
(a) (b) (c)
(d) (e)
Figure 4. Experime ntal setup 1: (a) Input image; (b) Super-
pixels evaluation result; (c) Probability map of orange tea
bottle; (d) Probability map of green tea carton box; (e) De-
tected objects.
Copyright © 2013 SciRes. JSEA
Object Detection Using SURF and Superpixels 515
(a) (b) (c)
(d) (e) (f)
Figure 7. Experimental se tup 4: (a) input image; (b) Super-
pixel evaluation results; (c) Probability map green tea
cardbox; (d) Probability map of orange bottle tea; (e) Prob-
ability map of chipstar (f) Detected objects.
Table 2. Evaluation results obtained for the ten different
setups used for system evaluation.
Setup number Number of objects Detected
objects Detection
rate
1 2 2 100%
2 3 3 100%
3 3 2 66%
4 3 1 33%
5 3 1 33%
6 3 3 100%
7 3 1 33%
8 3 3 100%
9 3 3 100%
10 2 2 100%
database. Figure 5(b) shows the superpixels used for
segmenting the desired regions, Figures 5(c)-(e) sho ws
the desired probability map of the green tea, milk tea
carton boxes and the yellows box containing potatoes;
finally, Figure 5(f) shows that, as we expected, the pro-
posed algorithm was able to correctly detect the three
objects.
Figure 6(a) shows the setup 8 which have several ob-
jects, 3 of them belonging to the database, which means
that the robot has to detect the 3 objects belonging to the
database. Figure 6(b) shows the superpixels used for
segmenting the desired region, Figures 6(c)-(e) show the
desired probability map of milk tea carton box, as well as
yellow and red potatoes packages; finally Figure 6(f)
shows that proposed system is able to correctly detect the
objects belonging to the database.
Figure 7(a) shows the setup 9 which also have several
objects, 3 of them belonging to the database which
means that the robot has to detect the 3 object belonging
to the database. Figure 7(b) shows the superpixels used
for segmenting the desired region, Figures 7(c)-(e) show
the desired probability of green tea carton box, orange
tea and chipstar, respectively. Finally Figure 7(f) shows
that the proposed system is able to correctly detect the
three objects in the database.
3.2. Evaluation Criterion
In the literature there are different evaluation criterions to
assess the local descriptors. Among the most remarkable
works it is worth to mention those that operate within the
ROC space [31], and those that employs the Recall vs.
1-Precision space [32,33].
In this work the descriptors evaluation was carried out
by using the work reported by Mikolajczyk and Shmid
[22,23], which employs the recall versus 1-precision cri-
terion. This evaluation criterion is based o n the number of
correct and false matches obtained for an image pair. The
test is based on the number of correct matches and false
descriptors obtained from a pair of images. The real posi-
tive, Tp (true positive) and false positives, Fp (false posi-
tive) denote the correct and false correlation that were
detected by the system. The False negative Fn (false
negative) and true ne gatives, Tn (true ne gative) represent s
the correct and false correlation that were not detected by
the system. The descriptor evaluation employs the fol-
lowing parameters:
Precision (P): is the fraction of detected region where
are the objects belonging to the database.
Tp
P
pTp
(7)
Recall (R): is defined as the fraction of object region
that is detected.
Tp
R
F
nTp
(8)
F–Measure (F): is the harmonic-mean of P and R.
2PR
FPR
(9)
Precision and recall are the basic measurements em-
ployed in the evaluation of searching strategies. Figure 8
shows the matching aspects between Fn, Tp and Fn. Fn
represents the relevant items that have not been detected.
On the other hand, the items that have been detected, but
are not relevant, are placed on the right (Fp). Recall is
obtained from the matching between Fn and Tp; which
determines the fraction of the detected area. The preci-
sion is determined by the matching between Tp and Fn,
which determines the fraction of the region where the
objects that belong to the same data base are detected.
Copyright © 2013 SciRes. JSEA
Object Detection Using SURF and Superpixels
516
Figure 8. Illustration of meaning of false positive (Fp), false
negative (Fn) and true positive (Tp).
In this work the evaluation criteria was applied for the
descriptors SURF employing the data set from visual
sensor. Note that recall and 1-precision are independent
terms. Recall is computed with respect to the number of
corresponding regions and 1-precision with respect to the
total number of matches.
Table 3 shows the results obtained for the recall (%),
precision (%) and F-measure, using the evaluation crite-
ria proposed by Mikolajczyk and Schmid [33], when the
proposed algorithm is applied to the experimental setups
described in Table 1. From these results it follows that
the proposed system performs fairly well in most situa-
tions, although it has difficulties when it is required to
detect plastic bottles. Table 4 shows a summary of the
performance of proposed algorithm using the evaluation
criteria proposed by Mikolajczyk and Schmid [33]. The
average results obtained for recall is 47.5%, for precision
is 79% and for F-measure is 57%, even it appears to be a
low detection rate, the result from 10 sceneries show that
the proposed system is able to detect 21 from 28 objects,
it means that 76.5% objects are correctly detected. The
plastic bottle gives the worst F-measure and from Table
4, its turns out that the transparent bottles are responsible
for the low recall, precision and F-measure rates. Figure
9 illustrate the difficulties found when it is required to
detect plastic bottles.
3.3. Comparison with the Y. Mae et al. [8]
The proposed method differs from the method proposed
by Mae et al. [8] in three main respects: 1) We use for
feature extraction the SURF [28] algorithm, while the
Mae et al. employed the Scale-invariant feature trans-
form (SIFT) [34-36] for this task; 2) To find the best
match for each feature we use the LSH [29,30], mean-
while Hough transform [35] was used by [8]; and 3) We
use 10 different small objects as carton bottle, plastic
bottle, and circular objects at a distance of 1.5 meters,
while in the experiments of [8], they used six small static
objects. Figure 10 shows the results obtained using the
Table 3. Detection of each object in terms of the parameters
recall, precision and measure.
Object Recall (%) Precision (%)Measure
Orange tea bottle 7.920 5.405 0.0642
Milk carton box 32.88 85.17 0.5470
Green tea carton box 69.50 97.69 0.8043
Red potatoes carton box 47.85 84.93 0.6114
Yellow potatoes carton box23.21 94.53 0.370 3
Chipstar 38.48 95.58 0.5421
Pringles 67.11 83.95 0.6269
Plastic seafood noodle 22.29 31.06 0.1676
Grape tea carton box 80.66 64.16 0.7147
Green tea bottle 37.32 55.42 0.4460
Table 4. Global evaluation in terms of precision, recall and
measure.
Characteristic Recall (%) Precision (%) Measure
Plastic bottle 22.62 30.41 0.0642
Cardboard 80.65 87.55 0.8313
Circular object 50.98 96.55 0.7459
Figure 9. Evaluation of descriptors.
Figure 10. The result of detection for static objects pro-
posed by Mae et al. [8].
Copyright © 2013 SciRes. JSEA
Object Detection Using SURF and Superpixels 517
Mae et al. method [8]. As we can see, the object detection
can be reliably obtained when they are in close range, but
the success rate drops dramatically when it is relatively
far. Objects C (carton cup) and D (canned) are small and
have round surfaces, and their input images show quite
large perspective deformation from the reference images
that are taken from a perpendicular viewpoint. As a result,
their success rates are dramatically lower than other ob-
jects with flat surface. The Object E and F (juice carton
box) at a distance of 1m is below about 25%. On the
other hand the proposed method with a distance of 1.5
meters between the objects and the robot provides better
results even with the plastic bottle that provides the worst
results. Thus using the proposed method the object de-
tection can be improved using the method when the dis-
tance between the object and robot is larger than 1.3 m.
Using the criteria propose d by Mikolajczyk and Schm id
[33] in terms of Precision–Recall we can see that the
proposed m ethod provides a fairly good pe rformance with
objects located at a distance of 1.5 meters, providing a
significant improvement even with small, non flat and
circular objects.
4. Conclusion
This paper proposes a novel object detection method
using local features inside the superpixel. The proposed
algorithm shows that object detection could be improved
using SURF features and SLIC superpixel. Our approach
can be used in an online robot system for a search task in
real environment. Experimental results illustrated the
capabilities of the algorithm to detect the object, which is
used as the average criteria of evaluation for recall and
precision as well as the detection rate. Evaluation results
show that the proposed algorithm performs fairly well in
the majority of the sceneries, although its performance
degrades when it is required to detect transparent plastic
objects. Our method exceeds the state of the art on Mae
et al. [8] for object detection, obtaining better results for
objects that are small and have round surfaces, especially
when the distance between the object and robot is larger
than 1.2 m. In the future work, we propose to increase
the database of objects and improve our object detection
system using 3D information.
5. Acknowledgements
The authors thank to the National Polytechnic Institute,
The University of Electro-communications and the Na-
tional Council of Science and Technology (CONACYT)
for their support during the realization of this research.
REFERENCES
[1] K. Lai, L. Bo, X. Ren and D. Fox, (2011) “A Large-Scale
Hierarchical Multi-View RGB-D Object Dataset,” Pro-
ceedings of International Conference on Robotics and
Automation, Shanghai, 2011, pp. 1817-1827.
[2] M. Özuysal, V. Lepetit and P. Fua, “Pose Estimation for
Category Specific Multiview Object Localization,” Pro-
ceedings of International Conference on Computer Vision
and Pattern Recognition, Miami, 20-25 June 2009, pp.
775-785.
[3] H. Harzallah, F. Jurie and C. Schmid, “Combining Effi-
cient Object Localization and Image Classification,” In-
ternational Conference on Computer Vision (ICCV),
Kyoto, 29 September -2 October 2009, pp. 237-244.
[4] N. Dalal and B. Triggs, “Histograms of Oriented Gradi-
ents for Human Detection,” Proceedings of International
Conference on Computer Vision and Pattern Recognition
(CVPR), San Diego, 25 June 2005, pp. 886-893.
[5] G. Bouchard and B. Triggs, “A Hierarchical Part-Based
Model for Visual Object Categorization,” Proceedings of
International Conference on Computer Vision and Pat-
tern Recognition (CVPR), San Diego, 20-25 June 2005,
pp. 710-715.
[6] F. Lafarge, X. Descombe, J. Zerubia and P. Desillingy,
“Structural Approach for building Reconstruction from a
Single DSM,” IEEE Trans on Pattern Analysis and Ma-
chine Intelligence, Vol. 32, No. 1, 2010, pp. 135-147.
doi:10.1109/TPAMI.2008.281
[7] F. Lafarge, X Descombe., J. Zerubia and P. Desillingy,
“Structural Approach for Building Reconstruction from a
Single DSM,” IEEE Trans on Pattern Analysis and Ma-
chine Intelligence, Vol. 32, No. 1, 2010, pp. 135-147.
doi:10.1109/TPAMI.2008.281
[8] Y. Mae, J. Choi, H. Takahashi, K. Ohara, T. Takubo and
T. Arai, “Interoperable Vision Component for Object De-
tection and 3D Pose Estimation for Modularized Robot
Control,” Mechatronics, Vol. 21, No. 6, 2011, pp. 983-
992. doi:10.1016/j.mechatronics.2011.03.008
[9] B. Leibe, A. Leonardis and B. Schiele, “Combined Object
Categorization and Segmentation with an Implicit Shape
Model,” Workshop on Statistical Learning in Computer
Vision, Prague, May 2004, pp. 1-16.
[10] J. Gall and V. Lempitsky, “Class-Specific Hough Forests
for Object Detection,” Proceedings of International Con-
ference on Computer Vision and Pattern Recognition
(CVPR), Miami, 20-25 June 2009, pp. 1022-1029.
[11] S. Lazebnik, C. Schmid and J. Ponce, “Beyond Bags of
Features: Spatial Pyramid Matching for Recognizing
Natural Scene Categories,” Proceedings of International
Conference on Computer Vision and Pattern Recognition
(CVPR), New York, 2006, pp. 2169-2178.
[12] J. Shotton, M. Winn, C. Rother and A. Criminisi, “Tex-
tonboost: Joint Appearance, Shape and Context Modeling
for Multi-Class Object Recognition and Segmentation,”
Lecture Notes in Computer Science, Vol. 3951, 2006, pp.
1-15.
[13] P. Viola and M. Jones, “Rapid Object Detection Using a
Boosted Cascade of Simple Features,” Proceedings of In-
ternational Conference on Computer Vision and Pattern
Recognition (CVPR), Vol. 1, 2001, pp. 511-518.
[14] A. Andreas-Opelt and A. Zisserman, “A Boundary-
Copyright © 2013 SciRes. JSEA
Object Detection Using SURF and Superpixels
Copyright © 2013 SciRes. JSEA
518
Fragment-Model for Object Detection,” Lecture Notes in
Computer Science, Vol. 3952, 2006, pp. 575-578.
doi:10.1007/11744047_44
[15] J. Ponce, S. Lazebnik, F. Rothganger and C. Schmid,
“Toward True 3d Object Recognition,” Proceedings of
International Conference on Computer Vision and Pat-
tern Recognition (CVPR), Washington, 2004, pp. 4034-
4041.
[16] M. Attami, A. Mizutami, T. Nakamura, T. Nagai, K. Fu-
nakoshi and M. Nakano, “Real-Time 3D Visual Sensor
for Robust Object Recognition,” Proceedings of Interna-
tional Conference on Intelligent Robots and Systems,
Taipei, 18-22 October 2010, pp. 4560-4565.
[17] D. Hoiem, A. Efros and M. Hebert, “Automatic Photo
Pop-Up,” Proceedings of International Conference on
Computer Graphics and Interactive Techniques, (SIG-
GRAPH), Los Angeles, July 2005, pp. 1-8.
[18] Y. Li, J. Sun, C. Tang and H. Shum, “Lazy Snapping,”
Proceedings International Conference on Computer Gra-
phics and Interactive Techniques (SIGGRAPH), Los An-
geles, 2004, pp. 303-308.
[19] X. He, R. Zemel and D. Ray, “Learning and Incorporating
Top-Down Cues in Image Segmentation,” Lecture Notes
in Computer Science, Vol. 3951, 2006, pp. 338-351.
doi:10.1007/11744023_27
[20] B. Fulkerson, A. Vedaldi and S. Soatto, “Class Segmen-
tation and Object Localization with Superpixel Neighbor-
hoods,” Proceedings of International Conference on
Computer Vision, (ICCV), Nara, 29 September-2 October
2009, pp. 670-677.
[21] X. Ren and J. Malik, “Learning a Classification Model for
Segmentation,” Proceedings of International Conference
on Computer Vision (ICCV ), Nice, 13-16 October 2003,
pp. 10-17. doi:10.1109/ICCV.2003.1238308
[22] G. Mori, “Guiding Model Search Using Segmentation,”
Proceeding of International Conference on Computer Vi-
sion, (ICCV), Las Vegas, 17-21 October 2005, pp. 1417-
1423.
[23] P. Felzenszwalb and D. Huttenlocher, “Efficient Graph-
Based Image Segmentation,” International Journal of
Computer Vision, Vol. 9, No. 2, 2004, pp. 167-181.
[24] A. Vedaldi and S. Soatto, “Quick Shift and Kernel Meth-
ods for Mode Seeking,” European Conference on Com-
puter Vision, Marseille, 2008, pp. 705-718.
[25] A. Levinshtein, A. Stere, K. Kutulakos, D. Fleet, S.
Dickinson and K. Siddiqi, “Turbopixels: Fast Superpixels
Using Geometric Flows,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol. 31, No. 12, 2009,
pp. 2290-2297. doi:10.1109/TPAMI.2009.96
[26] A. Moore, S. Prince, J. Warrell, U. Mohammed and G.
Jones, “Superpixel Lattices,” Proceedings of Interna-
tional Conference on Computer Vision and Pattern Rec-
ognition (CVPR), Anchorage, 23-28 June 2008, pp. 1-8.
[27] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua and S.
Süsstrunk, “SLIC Superpixels,” School of Computer and
Communications Sciences, EPFL Technical Report
149300, 2010.
[28] B. Herbert, E. Andreas, T. Tinne and V. Luc, “Speed-Up
Robust Features (SURF),” Computer Vision and Image
Understanding, Vol. 110, No. 3, 2008, pp. 346-359.
doi:10.1016/j.cviu.2007.09.014
[29] L. Qin, W. Josephson, Z. Wang and C. Kai-Li, “Multi-
Probe LSH: Efcient Indexing for High-Dimensional
Similarity Search,” Proceedings of Very Large Database
Conference, Vienna, 23-28 September 2007, pp. 950-961.
[30] S. Har-Peled, P. Indyk and R. Motwani, “Approximate
Nearest Neighbors: Towards Removing the Curse of Di-
mensionality,” Theory of Computing, Vol. 8, No. 1, 2012,
pp. 321-350.
[31] O. Miksik and K. Mikolajczyk, “Evaluation of Local De-
tectors and Descriptors for Fast Feature Matching,” Pro-
ceedings of International Conference on Pattern Recog-
nition, Tsukuba, 2012, pp. 2681-2684.
[32] Y. Ke and R. Sukthankar, “PCA-SIFT: A More Distinc-
tive Representation for Local Image Descriptors,” Work-
shop on Generic Object Recognition and Categorization,
Washington DC, 27 June-2 July 2004, pp. 506-513.
[33] K. Mikolajczyk and C. Schmid, “A Performance Evalua-
tion of Local Descriptors,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol. 27, No. 10, 2005,
pp. 1615-1630. doi:10.1109/TPAMI.2005.188
[34] http://www.robots.ox.ac.uk/~vgg/research/affine/
[35] D. Lowe, “Distinctive Image Features from Scale-In-
variant Keypoints,” International Journal of Computer
Vision, Vol. 20, No. 1, 2004, pp. 91-110.
doi:10.1023/B:VISI.0000029664.99615.94
[36] M. Lopez-de-la-Calleja, T. Nagai and H. Perez-Meana,
“Superpixel-Based Object Detection Using Local Feature
Matching,” Proceedings of the 29th Conference of the
Robotics Society of Japan, Toyosu, 2011, pp. 11-17.