Learning Actions from the Identity in the Web

Abstract

This paper proposes an efficient and simple method for identity recognition in uncontrolled videos. The idea is to use images collected from the web to learn representations of actions related with identity, use this knowledge to automatically annotate identity in videos. Our approach is unsupervised where it can identify the identity of human in the video like YouTube directly through the knowledge of his actions. Its benefits are two-fold: 1) we can improve retrieval of identity images, and 2) we can collect a database of action poses related with identity, which can then be used in tagging videos. We present the simple experimental evidence that using action images related with identity collected from the web, annotating identity is possible.

Share and Cite:

Ali, K. and Wang, T. (2014) Learning Actions from the Identity in the Web. Journal of Computer and Communications, 2, 54-60. doi: 10.4236/jcc.2014.29008.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Ikizler-Cinbis, N. and Sclaroff, S. Object, Scene and Actions: Combining Multiple Features for Human Action Recognition.
[2] Oreifej, O., Mehran, R. and Shah, M. (2010) Human Identity Recognition in Aerial Images. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Heller and Ghahramani, (2006) Bayesian Content-Based Image Retrieval.
[4] Singh, V.K. and Nevatia, R. Human Action Recognition Using a Dynamic Bayesian Action Network with 2D Part Models.
[5] Felzenszwalb, P., McAllester, D. and Ramanan, D. (2008) A Discrimina-tively Trained, Multiscale, Deformable Part Model. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Natarajan, P., Singh, V.K. and Nevatia, R. (2010) Learning 3D Action Models from a Few 2D Videos for View Invariant Action Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Kim, T.K. Wong, S.F., and Cipolla, R. (2007) Tensor Canonical Correlation Analysis for Action Classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Laptev, I., Marszalek, M., Schmid, C. and Rozenfeld, B. (2008) Learning Realistic Human Actions from Movies. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Niebles, J.C., Han, B., Ferencz, A. and Li, F.-F. Extracting Moving People from Internet.
[10] Li, L.-J., Wang, G. and Li, F.-F. (2007) Optimol: Automatic Object Picture Collection via Incremental Model Learning. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Martin, D., Fowlkes, C. and Malik, J. (2004) Learning to Detect Natural Image Boundaries Using Local Brightness, Color and Texture Cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26.
[12] Mikolajczyk, K. and Uemura, H. (2008) Action Recognition with Motion-Appearance Vocabulary Forest. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Vijayanarasimhan, S. and Grauman, K. (2008) Keywords to Visual Categories: Multiple-Instance Learning for Weakly Supervised Object Categorization. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Wang, G. and Forsyth, D. (2008) Object Image Retrieval by Exploiting Online Knowledge Resources. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Niebles, J.C., Han, B., Ferencz, A. and Li, F.-F. (2008) Extracting Moving People from Web Videos. European Conference on Computer Vision.
[16] Niebles, J.C., Wang, H. and Li, F.-F. (2006) Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. British Machine Vision Conference.
[17] Okada, R. and Soatto, S. (2008) Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images. European Conference on Computer Vision.
[18] Wang, Y., Jiang, H., Drew, M.S., Li, Z.-N. and Mori, G. (2006) Unsupervised Discovery of Action Classes. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Weinland, D. and Boyer, E. (2008) Action Recognition Using Exemplar Based Embedding. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Zanetti, S., Zelnik-Manor, L. and Perona, P. A Walk Through the Web’s Video.
[21] Gheissari, N., Sebastian, T. and Hartley, R. (2006) Person reidentification Using Spatiotemporal Appearance. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Anguelov, D., Lee, K.-C., Gokturk, S. and Sumengen, B. (2007) Contextual Identity Recognition in Personal Photo Albums. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Schindler, K. and van Gool, L. (2008) Action Snippets: How Many Frames Does Human Action Recognition Require? IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Schroff, F., Criminisi, A. and Zisserman, A. (2007) Harvesting Image Databases from the Web. International Conference on Computer Vision.
[25] Schuldt, C., Laptev, I. and Caputo, B. (2004) Recognizing Human Actions: A Local SVM Approach. International Conference on Pattern Recognition (ICPR).
[26] Thurau, C. and Hlavac, V. (2008) Pose Primitive Based Human Action Recognition in Videos or Still Images. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Andreas Christmann1 and Robert Hable. On the Bootstrap Approach for Support Vector Machines and Related Kernel Based Methods.
[28] Efron, B. (1979) Bootstrap Methods: Another Look at the Jackknife. The Annuals of Statistics, 7, 1-26.
[29] Tran, D. and Sorokin, A. (2008) Human Activity Recognition with Metric Learning. European Conference on Computer Vision.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.