Assessment, Design and Implementation of a Private Cloud for MapReduce Applications

Abstract

Scientific computation and data intensive analyses are ever more frequent. On the one hand, the MapReduce programming model has gained a lot of attention for its applicability in large parallel data analyses and Big Data applications. On the other hand, Cloud computing seems to be increasingly attractive in solving these computing problems that demand a lot of resources. This paper explores the potential symbiosis between MapReduce and Cloud Computing, in order to create a robust and scalable environment to execute MapReduce workflows regardless of the underlaying infrastructure. The main goal of this work is to provide an easy-to-install interface, so as non-expert scientists can deploy a suitable testbed for their MapReduce experiments on local resources of their institution. Testing cases were performed in order to evaluate the required time for the whole executing process on a real cluster.

Share and Cite:

Salgueiro, M. , González, P. , Pena, T. and Cabaleiro, J. (2014) Assessment, Design and Implementation of a Private Cloud for MapReduce Applications. Open Access Library Journal, 1, 1-10. doi: 10.4236/oalib.1100526.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Dean, J. and Ghemawat, S. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51, 107-133.
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/dean-cacm2008.pdf
[2] Ekanayake, J., Pallickara, S. and Fox, G. (2008) MapReduce for Data Intensive Scientific Analyses. IEEE Fourth International Conference on eScience, Indianapolis, 7-12 December 2008, 277-284.
[3] Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I. and Zaharia, M. (2009) Above the Clouds: A Berkeley View of Cloud Computing. Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley.
[4] Srirama, S.N., Jakovits, P. and Vainikko, E. (2012) Adapting Scientific Computing Problems to Clouds Using MapReduce. Future Generation Computer Systems, 28, 184-192.
http://dx.doi.org/10.1016/j.future.2011.05.025
[5] OpenStack.
http://www.openstack.org
[6] White, T. (2009) Hadoop: The Definitive Guide. O’Reilly Media.
[7] DevStack.
http://devstack.org
[8] Amazon Web Services: Elastic MapReduce.
http://aws.amazon.com/elasticmapreduce
[9] Riteau, P., Iordache, A. and Morin, C. (2011) Resilin: Elastic MapReduce for private and community Clouds. Research Report RR-7767, INRIA.
[10] OpenStack: Project Sahara.
https://wiki.openstack.org/wiki/Sahara
[11] Loughran, S., Alcaraz Calero, J.M., Farrell, A., Kirschnick, J. and Guijarro, J. (2012) Dynamic Cloud Deployment of a MapReduce Architecture. IEEE Internet Computing, 16, 40-50.
http://dx.doi.org/10.1109/MIC.2011.163
[12] Liu, H. and Orban, D. (2011) Cloud MapReduce: A MapReduce Implementation on Top of a Cloud Operating system. 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Washington DC, 23-26 May 2011, 464-474.
[13] Apache Cloudstack.
http://cloudstack.apache.org
[14] Moreno-Vozmediano, R., Montero, R.S. and Llorente, I.M. (2012) IaaS Cloud Architecture: From Virtualized Datacenters to Federated Cloud Infrastructures. IEEE Computer, 45, 65-72.
http://dx.doi.org/10.1109/MC.2012.76
[15] Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Youseff, L. and Zagorodnov, D. (2009) The Eucalyptus Open-Source Cloud-Computing System. 9th IEEE International Symposium on Cluster Computing and the Grid, Shanghai, 18-21 May 2009, 124-131.
[16] GridGain Systems. GridGain 3.0—High Performance Cloud Computing Whitepaper. Technical Report, 2011.
[17] Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J. and Fox, G. (2010) Twister: A Runtime for Iterative MapReduce. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, 21-25 June 2010, 810-818.
[18] Apache Hadoop 1.0.4 Based on CentOS 6.3 VM.
https://drive.google.com/file/d/0B2lmVzXW-C5UcmZIYk80dTZJb0k/edit?usp=sharing
[19] Qosh Main Page.
https://code.google.com/p/quick-openstacked-hadoop

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.