A Bioinformatics-Inspired Adaptation to Ukkonen’s Edit Distance Calculating Algorithm and Its Applicability Towards Distributed Data Mining
Johnson Bruce
University of Tennessee.
DOI: 10.4236/jsea.2008.11002   PDF    HTML     6,674 Downloads   10,364 Views   Citations

Abstract

Edit distance measures the similarity between two strings (as the minimum number of change, insert or delete operations that transform one string to the other). An edit sequence s is a sequence of such operations and can be used to represent the string resulting from applying s to a reference string. We present a modification to Ukkonen’s edit distance calculating algorithm based upon representing strings by edit sequences. We conclude with a demonstration of how using this representation can improve mitochondrial DNA query throughput performance in a distributed computing environment.

Share and Cite:

J. Bruce, "A Bioinformatics-Inspired Adaptation to Ukkonen’s Edit Distance Calculating Algorithm and Its Applicability Towards Distributed Data Mining," Journal of Software Engineering and Applications, Vol. 1 No. 1, 2008, pp. 8-12. doi: 10.4236/jsea.2008.11002.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] M. D. Vose, “A formal analysis of edit distance,” UT CS Technical Report ut-cs-04-517, February 2004.
[2] R. O. Duda and P. E. Hart, Pattern Classification (2nd ed.). Wiley Interscience, 2000.
[3] A. Wagner and M. I. Fischer, “The string-to-string correction problem,” Journal of the ACM, 21(1) (Jan. 1974), pp. 168-173, 1974.
[4] E. Ukkonen, “Algorithms for approximate string matching,” International Control 64, pp. 100-118, 1985.
[5] E. Ukkonen, “On approximate string matching,” International Conference Fundamentals of Computation Theory, Lecture Notes in Computer Science, pp. 158:487-495, 1983.
[6] N. Campbell and J. Reese, Biology (6th ed.), Addison Wesley, 1997.
[7] S. Anderson, et al., “Sequence and organization of the human mitochondrial genome,” Nature, 290(5806) (April 9, 1981), pp. 457-265, 1981.
[8] K. L. Monson, et al., “The mtDNA population database: An integrated software and database resource for forensic comparison,” Forensic Science Communications, 4(2), April 2002. DOI=http://www.fbi.gov/hq/lab/fsc/ backissu/april2002/miller1.htm.
[9] http://iperf.sourceforge.net.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.