Attention-Guided Organized Perception and Learning of Object Categories Based on Probabilistic Latent Variable Models


This paper proposes a probabilistic model of object category learning in conjunction with attention-guided organized perception. This model consists of a model of attention-guided organized perception of object segments on Markov random fields and a model of learning object categories based on a probabilistic latent component analysis. In attention guided organized perception, concurrent figure-ground segmentation is performed on dynamically-formed Markov random fields around salient preattentive points and co-occurring segments are grouped in the neighborhood of selective attended segments. In object category learning, a set of classes of each object category is obtained based on the probabilistic latent component analysis with the variable number of classes from bags of features of segments extracted from images which contain the categorical objects in context and an object category is represented by a composite of object classes. Through experiments using two image data sets, it is shown that the model learns a probabilistic structure of intra-categorical composition and inter-categorical difference of object categories and achieves high performance in object category recognition.

Share and Cite:

M. Atsumi, "Attention-Guided Organized Perception and Learning of Object Categories Based on Probabilistic Latent Variable Models," Journal of Intelligent Learning Systems and Applications, Vol. 5 No. 2, 2013, pp. 123-133. doi: 10.4236/jilsa.2013.52014.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] U. Neisser, “Cognitive Psychology,” Prentice Hall, Upper Saddle River, 1967.
[2] M. C. Mozer and S. P. Vecera, “Spaceand Object-Based Attention,” In: L. Itti, G. Rees and J. K. Tsotsos, Eds., Neurobiology of Attention, 2005, pp. 130-134. doi:10.1016/B978-012375731-9/50027-6
[3] R. Kimchi, Y. Yeshurun and A. Cohen-Savransky, “Automatic, Stimulus-Driven Attentional Capture by Objecthood,” Psychonomic Bulletin & Review, Vol. 14, No. 1, 2007, pp. 166-172. doi:10.3758/BF03194045
[4] S. Z. Li, “Markov Random Field Modeling in Image Analysis,” Springer-Verlag, Tokyo, 2001. doi:10.1007/978-4-431-67044-5
[5] T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, Vol. 42, No. 1-2, 2001, pp. 177-196. doi:10.1023/A:1007617005950
[6] M. Shashanka, B. Raj and P. Smaragdis, “Probabilistic Latent Variable Models as Nonnegative Factorizations,” Computational Intelligence and Neuroscience, Vol. 2008, 2008, 9 Pages. doi:10.1155/2008/947438
[7] G. Csurka, C. Bray, C. Dance and L. Fan, “Visual Categorization with Bags of Keypoints,” Proceedings of ECCV Workshop on Statistical Learning in Computer Vision, Prague, 15 May 2004, pp. 1-22.
[8] D. G. Lowe, “Distinctive Image Features from ScaleInvariant Keypoints,” International Journal of Computer Vision, Vol. 60, No. 2, 2004, pp. 91-110. doi:10.1023/B:VISI.0000029664.99615.94
[9] L. Itti, C. Koch and E. Niebur, “A Model of SaliencyBased Visual Attention for Rapid Scene Analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 11, 1998, pp. 1254-1259. doi:10.1109/34.730558
[10] L. Itti and C. Koch, “Computational Modelling of Visual Attention,” Nature Reviews Neuroscience, Vol. 2, No. 3, pp. 2001, pp. 194-203. doi:10.1038/35058500
[11] S. Frintrop, “VOCUS: A Visual Attention System for Object Detection and Goal-Directed Search,” Lecture Note in Artificial Intelligence, Vol. 3899, 2006. doi:10.1007/11682110
[12] M. Atsumi, “Stochastic Attentional Selection and Shift on the Visual Attention Pyramid,” Proceedings of the 5th International Conference on Computer Vision Systems, Bielefeld, 21-24 March 2007, 10 Pages doi:10.2390/biecoll-icvs2007-32
[13] S. Frintrop, E. Rome and H. I. Christensen, “Computational Visual Attention Systems and Their Cognitive Foundations: A Survey,” ACM Transactions on Applied Perception, Vol. 7, No. 1, 2010, pp. 1-39. doi:10.1145/1658349.1658355
[14] J. K. Tsotsos and A. Rothenstein, “Computational Models of Visual Attention,” Scholarpedia, Vol. 6, No. 1, 2011. doi:10.4249/scholarpedia.6201
[15] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 6, No. 6, 1984, pp. 721-741. doi:10.1109/TPAMI.1984.4767596
[16] M. Atsumi, “Attention-Based Segmentation on an Image Pyramid Sequence,” Lecture Notes in Computer Science, Vol. 5259, 2008, pp. 625-636. doi:10.1007/978-3-540-88458-3_56
[17] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang and H. Y. Shum, “Learning to Detect a Salient Object,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 2, 2011, pp. 353-367. doi:10.1109/TPAMI.2010.70
[18] A. Bosch, A. Zisserman and X. Munoz, “Scene Classification via pLSA,” Proceedings of the European Conference on Computer Vision, Vol. 3954, 2006, pp. 517-530. doi:10.1007/11744085_40
[19] S. Huang and L. Jin, “A PLSA-Based Semantic Bag Generator with Application to Natural Scene Classification under Multi-Instance Multi-Label Learning Framework,” 5th International Conference on Image and Graphics, Xi’an, 20-23 September 2009, pp. 331-335. doi:10.1109/ICIG.2009.108
[20] M. Atsumi, “Learning Visual Object Categories and Their Composition Based on a Probabilistic Latent Variable Model,” Lecture Notes in Computer Science, Vol. 6443, 2010, pp. 247-254. doi:10.1007/978-3-642-17537-4_31
[21] D. Walther, U. Rutishauser, C. Koch and P. Perona, “Selective Visual Attention Enables Learning and Recognition of Multiple Objects in Cluttered Scenes,” Computer Vision and Image Understanding, Vol. 100, No. 1-2, 2005, pp. 41-63. doi:10.1016/j.cviu.2004.09.004
[22] M. Bar, “Visual Objects in Context,” Nature Reviews Neuroscience, Vol. 5, No. 8, 2004, pp. 617-629. doi:10.1038/nrn1476
[23] A. Torralba, “Contextual Priming for Object Detection,” International Journal of Computer Vision, Vol. 53, No. 2, 2003, pp. 169-191. doi:10.1023/A:1023052124951
[24] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman and W. T. Freeman, “Discovering Objects and Their Location in Images,” 10th IEEE International Conference on Computer Vision, Vol. 1, 2005, pp. 370-377. doi:10.1109/ICCV.2005.77
[25] A. Rabinovich, C. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie, “Objects in Context,” IEEE 11th International Conference on Computer Vision, Rio de Janeiro, 14-21 October 2007, pp. 1-8. doi:10.1109/ICCV.2007.4408986
[26] C. Galleguillos, A. Rabinovich and S. Belongie, “Object Categorization Using Co-Occurrence, Location and Appearance,” IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 23-28 June 2008, pp. 1-8. doi:10.1109/CVPR.2008.4587799
[27] M. J. Choi, A. Torralba and A. S. Willsky, “A Tree-Based Context Model for Object Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 2, 2012, pp. 240-252. doi:10.1109/TPAMI.2011.119
[28] M. Atsumi, “Learning Visual Categories Based on Probabilistic Latent Component Models with Semi-Supervised Labeling,” GSTF International Journal on Computing, Vol. 2, No. 1, 2012, pp. 88-93.
[29] J. Zhang, “The Mean Field Theory in EM Procedures for Markov Random Fields,” IEEE Transactions on Signal Processing, Vol. 40, No. 10, 1992, pp. 2570-2583. doi:10.1109/78.157297
[30] G. Shlomo, “K-tree; A Height Balanced Tree Structured Vector Quantizer,” Proceedings of the 2000 IEEE Signal Processing Society Workshop Neural Networks for Signal Processing X, Sydney, 11-13 December 2000, pp. 271280. doi:10.1109/NNSP.2000.889418
[31] G. Griffin, A. Holub and P. Perona, “Caltech-256 Object Category Dataset,” Technical Report 7694, California Institute of Technology, Pasadena, 2007.
[32] Y. L. Boureau, F. Bach, Y. LeCun and J. Ponce, “Learning Mid-Level Features for Recognition,” IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 13-18 June 2010, pp. 2559-2566. doi:10.1109/CVPR.2010.5539963

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.