Transliterated Word Identification and Application to Query Translation Mining

DOI: 10.4236/jsea.2009.22018   PDF        4,674 Downloads   8,285 Views  


Query translation mining is a key technique in cross-language information retrieval and machine translation knowl-edge acquisition. For better performance, the queries are classified into transliterated words and non-transliterated words based on transliterated word identification model, and are further channeled to different mining processes. This paper is a pilot study on query classification for better translation mining performance, which is based on supervised classification and linguistic heuristics. The person name identification gets a precision of over 97%. Transliterated word translation mining shows satisfactory performance.

Share and Cite:

J. Zhang, L. Guo, M. Zhou and J. Yao, "Transliterated Word Identification and Application to Query Translation Mining," Journal of Software Engineering and Applications, Vol. 2 No. 2, 2009, pp. 122-126. doi: 10.4236/jsea.2009.22018.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] [1] F. Huang and Y. Zhang, “Ming key phrase translations from web corpora,” Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 483-490 ACL, 2005.
[2] [2] P. J. Cheng, J. W. Teng, R. C. Chen, J. H. Wang, W. H. Lu, and L. F. Chien, “Translating unknown queries with web corpora for cross-language information retrieval,” in the Proceedings of 27th ACM SIGIR, ACM Press, pp. 146-153, 2004.
[3] [3] C. Y. Lu, Y. Xu, and S. Geva, “Web-based query transla-tion for English-Chinese CLIR,” Computational Linguis-tics and Chinese Language Processing, Vol. 13, No. 1, pp. 61-90, 2008.
[4] [4] M. Nagata, T. Saito, and K. Suzuki, “Using the web as a bilingual dictionary,” Proceedings of ACL 2001 Work-shop Data-Driven Methods in Machine Translation, pp. 95-102. 2001.
[5] [5] W. H. Lu, L. F. Chien, and H. J. Lee, “Translation of web queries using anchor text mining,” ACM Transactions on Asian Language Information Processing (TALIP), Vol. 1, No. 2, pp. 159-172, 2002.
[6] [6] S. Li and H. T. Ng, “Mining new word translations from comparable corpora,” COLING 2004 ACL, 2004.
[7] [7] M. L. Zhou and J. M. Yao, “Mining named entity trans-literations from comparable corpora,” Proceedings of 7th International Conference on Chinese Computing, 2007.
[8] [8] J. Li, “Researching and implementing of English-Chinese transliteration method based on text,” Master’s degree thesis, Harbin Institute of Technology, 2005.
[9] [9] W. Gao, “Phoneme-based statistical transliteration of foreign names for OOV problem [D],” The Chinese Uni-versity of Hong Kong, 2004.
[10] [10] P. Virga and S. Khudanpur, “Transliteration of proper names in cross-lingual information retrieval[A],” in Pro-ceedings of the ACI Workshop on Multilingual Named Entity Recognition [C], 2003.
[11] [11] Xinhua News Agency, “Translation name office diction-ary of world-wide person name translations,” China Translation and Publishing Corporation, 1993.
[12] [12] W. H. Lin and H. H. Chen, “Backward machine translit-eration by learning phonetic similarity,” in Proceedings of CONLL, Taipei, Taiwan, pp. 139-145, 2002.
[13] [13] T. Lin, C. C. Wu, and J. S. Chang, “Word-transliteration alignment,” in Proceedings of ROCLING XV, Hsinchu, Taiwan, pp. 1-16, 2003.
[14] [14] W. Gao, K. F. Wong, and W. Lam, “Phoneme-based transliteration of foreign name for OOV problem,” in Proceedings of the first International Joint Conference on Natural Language Processing (IJCNLP), Hainan Island, China, pp. 274-381, 2004.
[15] [15] W. Lam, R. Z. Huang, and P. S. Cheung, “Learning pho-netic similarity for matching named entity translations and mining new translations,” in Proceedings of 27th In-ternational ACM SIGIR Conference on Research and Development in Information Retrieval, the University of Sheffield, UK, pp. 281-288, 2004.
[16] [16] S. Wan and C. M. Verspoor, “Automatic English-Chinese name transliteration for development of multilingual re-sources,” in Proceedings of 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Quebec, Canada, pp. 1352-1357, 1998.
[17] [17] W. H. Lu, J. H. Lin, and Y. S. Chang, “Improving trans-lation of queries with infrequent unknown abbreviations and proper names,” Computational Linguistics and Chinese Language Processing, Vol. 13, No. 1, pp. 91-120,

comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.