Share This Article:

Compound Hidden Markov Model for Activity Labelling

Abstract Full-Text HTML XML Download Download as PDF (Size:1100KB) PP. 177-195
DOI: 10.4236/ijis.2015.55016    3,179 Downloads   3,765 Views   Citations

ABSTRACT

This research presents a novel way of labelling human activities from the skeleton output computed from RGB-D data from vision-based motion capture systems. The activities are labelled by means of a Compound Hidden Markov Model. The linkage of several Linear Hidden Markov Models to common states, makes a Compound Hidden Markov Model. Each separate Linear Hidden Markov Model has motion information of a human activity. The sequence of most likely states, from a sequence of observations, indicates which activities are performed by a person in an interval of time. The purpose of this research is to provide a service robot with the capability of human activity awareness, which can be used for action planning with implicit and indirect Human-Robot Interaction. The proposed Compound Hidden Markov Model, made of Linear Hidden Markov Models per activity, labels activities from unknown subjects with an average accuracy of 59.37%, which is higher than the average labelling accuracy for activities of unknown subjects of an Ergodic Hidden Markov Model (6.25%), and a Compound Hidden Markov Model with activities modelled by a single state (18.75%).

Conflicts of Interest

The authors declare no conflicts of interest.

Cite this paper

Figueroa-Angulo, J. , Savage, J. , Bribiesca, E. , Escalante, B. and Sucar, L. (2015) Compound Hidden Markov Model for Activity Labelling. International Journal of Intelligence Science, 5, 177-195. doi: 10.4236/ijis.2015.55016.

References

[1] Aggarwal, J. and Ryoo, M. (2011) Human Activity Analysis: A Review. ACM Computing Surveys, 43, 16:1-16:43.
http://dx.doi.org/10.1145/1922649.1922653
[2] Bobick, A. and Davis, J. (2001) The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 257-267.
http://dx.doi.org/10.1109/34.910878
[3] Ke, Y., Sukthankar, R. and Hebert, M. (2007) Spatio-Temporal Shape and Flow Correlation for Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, 17-22 June 2007, 1-8.
http://dx.doi.org/10.1109/cvpr.2007.383512
[4] Shechtman, E. and Irani, M. (2005) Space-Time Behavior Based Correlation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, 20-25 June 2005, 405-412.
http://dx.doi.org/10.1109/cvpr.2005.328
[5] Campbell, L. and Bobick, A. (1995) Recognition of Human Body Motion Using Phase Space Constraints. 5th International Conference on Computer Vision, Cambridge, 20-23 June 1995, 624-630.
http://dx.doi.org/10.1109/ICCV.1995.466880
[6] Rao, C. and Shah, M. (2001) View-Invariance in Action Recognition. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, 8-14 December 2001, II-316-II-322.
http://dx.doi.org/10.1109/cvpr.2001.990977
[7] Sheikh, Y., Sheikh, M. and Shah, M. (2005) Exploring the Space of a Human Action. Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, 15-21 October 2005, 144-149.
http://dx.doi.org/10.1109/iccv.2005.90
[8] Ryoo, M.S. and Aggarwal, J. (2009) Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities. Proceedings of the 12th IEEE International Conference on Computer Vision, Kyoto, 27 September-4 October 2009, 1593-1600.
[9] Wong, K.Y.K., Kim, T.-K. and Cipolla, R. (2007) Learning Motion Categories Using Both Semantic and Structural Information. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, 18-23 June 2007, 1-6.
http://dx.doi.org/10.1109/cvpr.2007.383332
[10] Yilma, A. and Shah, M. (2005) Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras. Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, 15-21 October 2005, 150-157.
http://dx.doi.org/10.1109/iccv.2005.201
[11] Vintsyuk, T. (1968) Speech Discrimination by Dynamic Programming. Cybernetics, 4, 52-57.
http://dx.doi.org/10.1007/BF01074755
[12] Darrell, T. and Pentland, A. (1993) Space-Time Gestures. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 335-340.
http://dx.doi.org/10.1109/cvpr.1993.341109
[13] Gavrila, D. and Davis, L. (1996) 3-D Model-Based Tracking of Humans in Action: A Multi-View Approach. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 18-20 June 1996, 73-80.
[14] Yacoob, Y. and Black, M. (1998) Parameterized Modeling and Recognition of Activities. Proceedings of the Sixth International Conference on Computer Vision, Bombay, 7 January 1998, 120-127.
http://dx.doi.org/10.1109/iccv.1998.710709
[15] Rabiner, L.R. (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77, 257-286.
http://dx.doi.org/10.1109/5.18626
[16] Rabiner, L. and Juang, B.H. (1993) Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs.
[17] Fink, G.A. (2007) Markov Models for Pattern Recognition: From Theory to Applications. Springer E-Books.
[18] Magee, D.R. and Boyle, R.D. (2002) Detecting Lameness Using “Re-Sampling Condensation” and “Multi-Stream Cyclic Hidden Markov Models”. Image and Vision Computing, 20, 581-594.
http://dx.doi.org/10.1016/S0262-8856(02)00047-1
[19] Chen, H.-S., Chen, H.-T., Chen, Y.-W. and Lee, S.-Y. (2006) Human Action Recognition Using Star Skeleton. Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks, New York, 171-178.
http://dx.doi.org/10.1145/1178782.1178808
[20] Starner, T.E. and Pentland, A. (1995) Visual Recognition of American Sign Language Using Hidden Markov Models. Proceedings of the International Workshop on Automatic Face-and Gesture-Recognition, Zurich, 26-28 June 1995.
[21] Sung, J., Ponce, C., Selman, B. and Saxena, A. (2012) Unstructured Human Activity Detection from RGBD Images. Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, 14-18 May 2012, 842-849.
[22] Xia, L., Chen, C.-C. and Aggarwal, J. (2012) View Invariant Human Action Recognition Using Histograms of 3D Joints. Proceedings of the 2nd International Workshop on Human Activity Understanding from 3D Data (HAU3D), Providence, 16-21 June 2012.
[23] Yamato, J., Ohya, J. and Ishii, K. (1992) Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Champaign, 15-18 June 1992, 379-385.
http://dx.doi.org/10.1109/cvpr.1992.223161
[24] Bobick, A., Ivanov, Y., Bobick, A.F. and Ivanov, Y.A. (1998) Action Recognition Using Probabilistic Parsing. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Santa Barbara, 23-25 June 1998, 196-202.
http://dx.doi.org/10.1109/cvpr.1998.698609
[25] Nergui, M., Yoshida, Y., Imamoglu, N., Gonzalez, J. and Yu, W. (2012) Human Behavior Recognition by a Bio-\Monitoring Mobile Robot. In: Proceedings of the 5th International Conference on Intelligent Robotics and Applications—Volume Part II, Springer-Verlag, Berlin, Heidelberg, 21-30.
http://dx.doi.org/10.1007/978-3-642-33515-0_3
[26] Oh, C.-M., Islam, M.Z., Park, J.-W. and Lee, C.-W. (2010) A Gesture Recognition Interface with Upper Body Model-Based Pose Tracking. Proceedings of the 2nd International Conference on Computer Engineering and Technology, Chengdu, 16-18 April 2010, V7-531-V7-534.
http://dx.doi.org/10.1109/iccet.2010.5485583
[27] Yu, E. and Aggarwal, J.K. (2006) Detection of Fence Climbing from Monocular Video. In: Proceedings of the 18th International Conference on Pattern Recognition, IEEE Computer Society, Washington DC, 375-378.
http://dx.doi.org/10.1109/icpr.2006.440
[28] Zhang, D., Gatica-Perez, D., Bengio, S. and McCowan, I. (2006) Modeling Individual and Group Actions in Meetings with Layered HMMS. IEEE Transactions on Multimedia, 8, 509-520.
[29] Glodek, M., Layher, G., Schwenker, F. and Palm, G. (2012) Recognizing Human Activities Using a Layered Markov Architecture. In: Villa, A., Duch, W., érdi, P., Masulli, F. and Palm, G., Eds., Artificial Neural Networks and Machine Learning—ICANN 2012, Springer, Berlin, 677-684.
http://dx.doi.org/10.1007/978-3-642-33269-2_85
[30] Glodek, M., Schwenker, F. and Palm, G. (2012) Detecting Actions by Integrating Sequential Symbolic and Sub-Symbolic Information in Human Activity Recognition. In: Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition, Springer-Verlag, Berlin, Heidelberg, 394-404.
http://dx.doi.org/10.1007/978-3-642-31537-4_31
[31] Brand, M., Oliver, N. and Pentland, A. (1997) Coupled Hidden Markov Models for Complex Action Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, 17-19 June 1997, 994-999.
http://dx.doi.org/10.1109/CVPR.1997.609450
[32] Oliver, N., Rosario, B. and Pentland, A. (2000) A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 831-843.
http://dx.doi.org/10.1109/34.868684
[33] Duong, T.V., Bui, H.H., Phung, D.Q. and Venkatesh, S. (2005) Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model. IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2005, 1, 838-845.
http://dx.doi.org/10.1109/CVPR.2005.61
[34] Natarajan, P. and Nevatia, R. (2007) Coupled Hidden Semi Markov Models for Activity Recognition. Proceedings of the IEEE Workshop on Motion and Video Computing, Austin, 23-24 February 2007, 10.
http://dx.doi.org/10.1109/wmvc.2007.12
[35] Shi, Q., Wang, L., Cheng, L. and Smola, A. (2008) Discriminative Human Action Segmentation and Recognition Using Semi-Markov Model. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 24-26 June 2008, 1-8.
[36] Sung, J., Ponce, C., Selman, B. and Saxena, A. (2011) Human Activity Detection from RGBD Images. Technical Report, Carnegie Mellon University, Department of Computer Science, Cornell University, Ithaca, NY.
[37] Guenterberg, E., Ghasemzadeh, H., Loseu, V. and Jafari, R. (2009) Distributed Continuous Action Recognition Using a Hidden Markov Model in Body Sensor Networks. In: Proceedings of the 5th IEEE International Conference on Distributed Computing in Sensor Systems, Springer-Verlag, Berlin, Heidelberg, 145-158.
http://dx.doi.org/10.1007/978-3-642-02085-8_11
[38] Lowerre, B.T. (1976) The Harpy Speech Recognition System. PhD Thesis, Carnegie Mellon University, Pittsburgh.
[39] Ryoo, M.S. and Aggarwal, J.K. (2006) Recognition of Composite Human Activities through Context-Free Grammar Based Representation. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, 17-22 June 2006, 1709-1718.
[40] Savage, J. (1995) A Hybrid System with Symbolic AI and Statistical Methods for Speech Recognition. PhD Thesis, University of Washington, Seattle.
[41] Gong, S. and Xiang, T. (2003) Recognition of Group Activities Using Dynamic Probabilistic Networks. Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, 13-16 October 2003, 742-749.
[42] Nguyen-Duc-Thanh, N., Lee, S. and Kim, D. (2012) Two-Stage Hidden Markov Model in Gesture Recognition for Human Robot Interaction. International Journal of Advanced Robotic Systems, 9.
[43] Oliver, N., Horvitz, E. and Garg, A. (2002) Layered Representations for Human Activity Recognition. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, IEEE Computer Society, Washington DC, 3-8.
http://dx.doi.org/10.1109/ICMI.2002.1166960
[44] Lasseter, J. (1987) Principles of Traditional Animation Applied to 3D Computer Animation. ACM SIGGRAPH Computer Graphics, 21, 35-44.
http://dx.doi.org/10.1145/37402.37407
[45] Williams, R. (2009) The Animator’s Survival Kit. Second Edition, Faber & Faber, London.
[46] Wang, J., Liu, Z., Wu, Y. and Yuan, J. (2012) Mining Actionlet Ensemble for Action Recognition with Depth Cameras. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 16-21 June 2012, 1290-1297.
http://dx.doi.org/10.1109/cvpr.2012.6247813
[47] Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A. and Blake, A. (2011) Real-Time Human Pose Recognition in Parts from Single Depth Images. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Washington DC, 1297-1304.
http://dx.doi.org/10.1109/cvpr.2011.5995316
[48] Bribiesca, E. (2000) A Chain Code for Representing 3D Curves. Pattern Recognition, 33, 755-765.
http://dx.doi.org/10.1016/S0031-3203(99)00093-X
[49] Bribiesca, E. (2008) A Method for Representing 3D Tree Objects Using Chain Coding. Journal of Visual Communication and Image Representation, 19, 184-198.
http://dx.doi.org/10.1016/j.jvcir.2008.01.001

  
comments powered by Disqus

Copyright © 2018 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.