A Low-Memory-Requiring and Fast Approach to Cluster Large-Scale Decoy Protein Structures

HTML  Download Download as PDF (Size: 560KB)  PP. 57-63  
DOI: 10.4236/ojbiphy.2012.23008    4,093 Downloads   7,932 Views  

ABSTRACT

This work demonstrates the so-called PCAC (Protein principal Component Analysis Clustering) method, which clusters large-scale decoy protein structures in protein structure prediction based on principal component analysis (PCA), is an ultra-fast and low-memory-requiring clustering method. It can be two orders of magnitude faster than the commonlyused pairwise rmsd-clustering (pRMSD) when enormous of decoys are involved. Instead of N(N – 1)/2 least-square fitting of rmsd calculations and N2 memory units to store the pairwise rmsd values in pRMSD, PCAC only requires N rmsd calculations and N × P memory storage, where N is the number of structures to be clustered and P is the number of preserved eigenvectors. Furthermore, PCAC based on the covariance Cartesian matrix generates essentially the identical result as that from the reference rmsd-clustering (rRMSD). From a test of 41 protein decoy sets, when the eigenvectors that contribute a total of 90% eigenvalues are preserved, PCAC method reproduces the results of near-native selections from rRMSD.

Share and Cite:

Y. Yuan, Y. Shang and H. Li, "A Low-Memory-Requiring and Fast Approach to Cluster Large-Scale Decoy Protein Structures," Open Journal of Biophysics, Vol. 2 No. 3, 2012, pp. 57-63. doi: 10.4236/ojbiphy.2012.23008.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.