Hadoop-Based Similarity Computation System for Composed Documents ()
Xiaoming Zhang,
Zhipeng Qin,
Xuwei Liu,
Qianyun Hou,
Baishuang Zhang,
Jie Wu
Department of Computer, Beijing Institute of Petrochemical Technology, Beijing, China.
DOI: 10.4236/jcc.2015.35025
PDF HTML XML
2,903
Downloads
3,329
Views
Citations
Abstract
There exist a large number of
composed documents in universities in the teaching process. Most of them are
required to check the similarity for validation. A kind of similarity
computation system is constructed for composed documents with images and text
information. Firstly, each document is split and outputs two parts as images
and text information. Then, these documents are compared by computing the
similarities of images and text contents independently. Through Hadoop system,
the text contents are easily and quickly separated. Experimental results show
that the proposed system is efficient and practical.
Share and Cite:
Zhang, X. , Qin, Z. , Liu, X. , Hou, Q. , Zhang, B. and Wu, J. (2015) Hadoop-Based Similarity Computation System for Composed Documents.
Journal of Computer and Communications,
3, 196-202. doi:
10.4236/jcc.2015.35025.
Conflicts of Interest
The authors declare no conflicts of interest.
References
[1]
|
Mao, E., Wesley, P. and Chu, W. (2007) The Phrase Based Vector Space Model for Automatic Retrieval of Free- Document Medical Documents. Data & Knowledge Engineering, 1.
|
[2]
|
He, C.B., Tang, Y. and Tang, F.Y. (2011) Large-Scale Document Similarity Computation Based on Cloud Computing Platform. 2011 6th International Conference on Pervasive?Computing and Applications (ICPCA).
|
[3]
|
Li, L.N., Li, C.P. and Chen, H. (2013) Map Reduce-Based SimRank Computation and Its Application. 2013 IEEE International Congress on Big Data.
|
[4]
|
Baraglia, R., Morales, G.F. and Lucchese, C. (2010) Document Similarity Self-Join with MapReduce. 2010 IEEE International Conference on Data Mining. http://dx.doi.org/10.1109/ICDM.2010.70
|
[5]
|
Dean, J. and Ghemawat, S. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 1. http://dx.doi.org/10.1145/1327452.1327492
|