TITLE:
OTE-24LD: An Extended Descriptor Integrating Long-Distance Correlations for the Prediction of Macromolecular Interactions
AUTHORS:
Obonan Etienne Traore, Ndiffon Charlemagne Kopoin, Dagou Dangui Augustin Sylvain Legrand Koffi, Gbame Gbede Sylvain, Souleymane Oumtanaga
KEYWORDS:
Macromolecular Interactions, Feature Extraction, Long-Distance Correlations, OTE-24LD Descriptor, Protein-Protein Interaction Prediction
JOURNAL NAME:
Open Journal of Applied Sciences,
Vol.15 No.8,
August
18,
2025
ABSTRACT: The prediction of interactions between biological macromolecules, particularly macromolecular interactions, remains a major challenge in structural and functional bioinformatics. Numerous feature extraction methods have been developed, relying primarily on the physicochemical properties of amino acids and their sequential relationships to address this issue. Among these approaches, descriptors such as AAC (Amino Acid Composition), DPC (Dipeptide Composition), CTD (Composition-Transition-Distribution), and PseAAC (Pseudo Amino Acid Composition) have been widely used to transform macromolecular sequences into numerical vectors suitable for machine learning models. In this context, the OTE-24 method was recently introduced to capture local correlations between residues based on two normalized physicochemical properties. Despite its performance, as with several earlier methods, this model suffers from an intrinsic limitation: it overlooks long-distance correlations, which play a crucial role in the formation of recognition sites and the stability of macromolecular complexes. To address this limitation, we propose in this study an optimized extension of OTE-24, named OTE-24LD (Long Distance), which enhances the original descriptor by integrating distant relationships between residues, using a decreasing weighting factor modulated by positional distance. This improvement makes it possible to capture functional interactions often missed by descriptors based solely on immediate neighborhoods. The evaluation of this method, conducted on the HPRD dataset, demonstrates a 6.73% improvement in precision compared to earlier approaches.