Clusters Merging Method for Short Texts Clustering

Yu Wang; Lihui Wu; Hongyu Shao

doi:10.4236/jss.2014.29032

Open Journal of Social Sciences > Vol.2 No.9, September 2014

Clusters Merging Method for Short Texts Clustering

Yu Wang^*, Lihui Wu, Hongyu Shao
School of Management Science and Engineering, Dalian University of Technology, Dalian, China.
DOI: 10.4236/jss.2014.29032 PDF HTML 3,757 Downloads 4,667 Views Citations

Abstract

Under push of Mobile Internet, new social media such as microblog, we chat, question answering systems are constantly emerging. They produce huge amounts of short texts which bring forward new challenges to text clustering. In response to the features of large amount and dynamic growth of short texts, a two-stage clustering method was putted forward. This method adopted a sliding window sliding on the flow of short texts. Inside the slide window, hierarchical clustering method was used, and between the slide windows, clusters merging method based on information gain was adopted. Experiment indicated that this method is fast and has a higher accuracy.

Keywords

Short Texts Clustering, Slide Window, Information Gain, Hierarchical Clustering

Share and Cite:

Wang, Y. , Wu, L. and Shao, H. (2014) Clusters Merging Method for Short Texts Clustering. Open Journal of Social Sciences, 2, 186-192. doi: 10.4236/jss.2014.29032.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	He, H., Chen, B., Xu, W., et al. (2007) Short Text Feature Extraction and Clustering for Web Topic Mining. IEEE Third International Conference on Semantics, Knowledge and Grid, 382-385.
[2]	Hartigan, J.A. and Wong, M.A. (1979) Algorithm AS 136: A k-Means Clustering Algorithm. Journal of the Royal Statistical Society, Series C (Applied Statistics), 28, 100-108.
[3]	Szekely, G.J. and Rizzo, M.L. (2005) Hierarchical Clustering via Joint between-within Distances: Extending Ward’s Minimum Variance Method. Journal of Classification, 22, 151-183. http://dx.doi.org/10.1007/s00357-005-0012-9
[4]	Zhao, P. and Cai, Q.S. (2007) Research of Novel Chinese Text Clustering Algorithm Based on HowNet. Computer Engineering and Applications, 43, 162-163.
[5]	Tang, J., Wang, X., Gao, H., et al. (2012) Enriching Short Text Representation in Microblog for Clustering. Frontiers of Computer Science, 6, 88-101.
[6]	Wang, L., Jia, Y., Han, W. (2007) Instant Message Clustering Based on Extended Vector Space Model. Advances in Computation and Intelligence, Springer Berlin Heidelberg, 435-443. http://dx.doi.org/10.1007/978-3-540-74581-5_48
[7]	Peng, Z.Y., Yu, X.M., Xu H.B., et al. (2011) Incomplete Clustering for Large Scale Short Texts. Journal of Chinese Information, 25, 54-59.
[8]	Chen, J.C., Hu, G.W., Yang, Z.H., et al. (2011) Text Clustering Based on Global Center-Determination. Computer Engineering and Applications, 47, 147-150.
[9]	Liu, Z.X., Liu, Y.B. and Luo, L.M. (2010) An Efficient Density and Grid Based Clustering Algorithm. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 22, 242-247.
[10]	Quinlan, J.R. (1979) Discovering Rules by Induction from Large Collections of Examples. Expert Sys-tems in the Micro Electronic Age. Edinburgh University Press.
[11]	Guha, S., Rastogi, R. and Shim, K. (1998) CURE: An Efficient Clustering Algorithm for Large Databases. ACM SIGMOD Record, ACM, 27, 73-84.
[12]	Zhou, Z.T. (2005) Quality Evaluation of Text Clustering Results and Investigation on Text Representation. Graduate University of Chinese Academy of Sciences, Beijing.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies