Exploring MIA-QSARs for farnesyltransferase inhibitory effect of antimalarial compounds refined by docking simulations

Two series of farnesyltransferase (FTase) inhibitors were grouped and their antimalarial activities modeled by means of multivariate image analysis applied to quantitative structure-activity relationship (MIA-QSAR). A reliable model was achieved, with r for calibration, external prediction and leave-one-out cross-validation of 0.96, 0.87 and 0.83, respectively. Therefore, biological activities of congeners can be estimated using the QSAR model. The bioactivities of new compounds based on the miscellany of substructures of the two classes of FTase inhibitors were predicted using the MIA-QSAR model and the most promising ones were submitted to ADME (absorption, distribution, metabolism and excretion) and docking evaluation. Despite the smaller interaction energy of the two most promising, predicted compounds in comparison to the two most active compounds of the data set, one of the proposed structures did not violate any Lipinski’s rule of five. Therefore, it is either a potential drug or may drive synthesis of similar, improved compounds.


INTRODUCTION
Malaria is a disease caused by parasites that are transmitted to humans via mosquito bites.Symptoms of illness may include fever, headache, muscle pain, nausea, and vomiting.Malaria is still one of the most deadly diseases affecting third-world countries; it is estimated to cause around 300 million clinical cases and over one million deaths each year [1].It is associated with the high morbidity and mortality, therefore, the control of malaria is globally a high priority task.
Despite the discovery of artemisinin which has shown high antimalarial activity [2,3], there has been a continuous interest for the search of new drugs [4] which are effective against the resistant Plasmodium falciparum.Few groups of potentially antimalarial drugs are used as chemotherapeutics such as quinoline derivatives [5], and simple sulfonamides [6].
Researchers are successively looking for new entities that have high potency against the malaria activity.Such ligands can be properly developed using QSAR (quantitative structure-activity relationship) methods.There have been some studies of 3D QSAR analysis of farnesyltransferase (FTase) inhibitors [7][8][9], which are also involved as promising compounds for the treatment of broad spectrum of cancers.Recently, Doerksen et al. [10] studied a highly diverse series of 192 Abbott-initiated imidazole-containing compounds and their FTase inhibitory activities using 3D-QSAR (CoMFA/CoMSIA) and docking.Naik et al. [11] have performed quantitative structure-activity relationship (QSAR) analysis on a data set of 194 artemisinin analogs for antimalarial activity.Prabhakar et al. [12] have performed QSAR of antimalarial activity of two distinct series of N1-(7-chloro-4quinolyl)-1,4-bis(3-aminopropyl) piperazine analogues with DRAGON descriptors in order to rationalize their activity using CP-MLR (combinatorial protocol multiple linear regression) method.
In this work, we used MIA-QSAR (multivariate image analysis applied to QSAR) method [13][14][15][16][17][18], a two dimensional (2D) image-based approach which uses pixels of 2D chemical structures as descriptors (binaries).Such simple approach has shown to be a highly predictive tool.Also, the MIA-QSAR method presents the advantage of working well when equally simple, classical analysis fails.Moreover, the present QSAR analysis based on 2D chemical drawings (constrained structures) dispenses conformational screening and 3D alignment to provide reliable QSAR models; the physicochemical description about e.g.steric effects (groups containing pixels occupying a large area in the workspace) and stereogenic centers (hashed or wedged lines representing back or front bond relative to a chiral carbon), encoded in the way in which substituent in a congeneric series are drawn.In order to corroborate the results obtained through this ligand-based approach, docking studies were carried out for the most promising-proposed drugs.Furthermore, an ADME (absorption, distribution, metabolism, and excretion) evaluation was carried out to search for the most suitable, predicted compounds.

MIA-QSAR
The data set shown in Table S1 (in the supplementary material) is based on the data available in the literature [19,20] for two classes of farnesyltransferase (FTase) inhibitors: tetrahydroquinoline (series 1 from compound 1 to 24) and benzonitrile analogs (series 2 from compound 25 to 66).Two models (Model 1 and Model 2) were built for this data set, with the first one including 59 compounds of the total 66, and the second one including the whole series.The difference between the two models is the biological data of the first series, which is based on the inhibition of the P. falciparum FTase enzyme (Model 1), and the inhibition of the mammalian FTase enzyme (Model 2).
The QSAR approach used in this work, called MIA-QSAR, has been described in detail by Freitas et al. [21]; thus, only a brief description is given here.The MIA-QSAR method is based on the treatment of images and their correlation with bioactivities; the images are chemical structures drawn by means of any specific program for this end.The chemical structures of the data set were drawn using the ChemSketch program [22] and then each of them was saved as bitmaps in a workspace with predefined dimensions (500 pixels  400 pixels size).In order to make the basic scaffold of the whole series congruent, a pixel in common among all chemical structures was fitted in a given coordinate (2D-alignment).Thus, the variable chemical moieties explain the variance in the activities block.The 59 bidimensional arrays of 500  400 size (for Model 1) or 66 (for Model 2) images were converted into numerical values (binaries) and then grouped to give three-way arrays of 59  500  400 or 66  500  400 dimensions, which were unfolded to 59  200,000 and 66  200,000 matrices.Columns with no variance (corresponding to parts of the molecules that do not vary or to blank spaces) were deleted to reduce the matrix size and optimize the computational cost, result-ing in 59  7847 and 66  7847 matrices.These matrices were randomly decomposed into 47  7847 and 53  7847 training set matrices, and 12  7847 and 13  7847 test set matrices (compounds with limiting activity values were kept in the calibration set).These matrices were regressed against the activities column vector by means of partial least squares (PLS) regression.The model was validated through leave-one-out cross-validation and external validation.

Docking Studies
Crystal coordinates of wild-type rat protein FTase (2.05Å resolution) in the bound state with ethylenediamine-scaffold, Zn 2+ , FPP were taken from the Protein Data Bank (PDB code: 3E34) [23].The structure is missing the first 54 residues (α subunit); it should be kept in mind, however, that those amino acid residues are not so important for the ligand recognition because they are very far from the active site, in addition, the β subunit contains most of the active site residues.The 3D coordinates of P. falciparum FTase was not used in this work because its experimental crystal structure has not been elucidated so far.In addition, homology modeling of the P. falciparum FTase is complicated by its size and regions of high divergence.On the other hand, human and rat FTAse have a primary structure very similar: α and β subunits show 92% and 96% identity, respectively.Compounds 3, 18, C and D were docked inside the FTase active site.Three-dimensional structures of compounds were built in the PC Spartan program Pro/Builder module [24].Subsequently, the overall geometry optimizations and partial atomic charge distribution calculations of the ligands were performed with the same program using the AM1 semi-empirical molecular orbital method [25].Compounds were docked into the FTase binding sites using the Molegro Virtual Docker (MVD) [26], a program for predicting the most likely conformation of how a ligand will bind to a macromolecule.The Mol-Dock scoring function (MolDock Score) used by MVD program is derived from the PLP (Piecewise Linear Potential) and further extended in GEMDOCK (Generic Evolutionary Method for molecular DOCK) with a new hydrogen bonding term and new charge schemes [26].The docking search algorithm used in MVD is based on interactive optimization techniques inspired by Darwinian evolution theory (evolutionary algorithms-EA).The potential binding site of PDE-5 receptor was calculated using the built-in cavity detection algorithm from the program.Ligand molecules and a subset region composed of all amino acid residues (side chain) having at least one atom within 12 Å of the center of the ligand are considered flexible during the docking simulation.In accordance to literature, ethylenediamine-scaffold ana-logs coordinate the catalytic Zn 2+ via their N-methylimidazole group and have moieties that bind in the product exit groove [23].Based on these informations, we selected the conformation of each compound using their Nmethylimidazole group as reference.

QSAR Studies
Both series of compounds used in this MIA-QSAR modeling are benzonitrile derivatives (Table S1 in the supplementary material) and, therefore, this chemical portion was left congruent in the 2D alignment, since this pose plays an important role in the interaction mode with the farnesyltransferase enzyme.In order to find the optimum number of PLS components to be used in Model 1, the root mean square errors of calibration (RMSEC) and leave-one-out cross-validation (RMSECV) were analyzed, and the latter was minimized at 3 latent variables.Therefore, the calibration was carried out using 3 PLS components, giving a squared correlation coefficient (r 2 ) of experimental versus fitted pIC 50 (IC 50 in mol•L -1 ) of 0.887 (RMSEC = 0.450).In order to investigate the possibility of obtaining chance correlation, a Y-randomization test was performed, i.e. the activities block was shuffled and regressed against the unaltered calibration matrix-this resulted in a poor correlation ( Y randomization  = 0.393, while the recommended value is 0.8), confirming that the true calibration is robust.The MIA-QSAR model was validated through leave-one-out crossvalidation (LOOCV), giving a q 2 of 0.771 (RMSECV = 0.640).However, Golbraikh and Tropsha [27] state that the only way to establish a reliable QSAR model is by means of an external validation; therefore, the calibration parameters were used to predict the antimalarial activities of a test set, resulting in a satisfactory test of 0.536 and root mean square error of prediction (RMSEP) of 0.890.A good consensus has been reported between a benzodiazepine analogue bound to rat FTase (PDB code 1SA5) and the predicted binding mode of tetrahydroquinoline analogues to a homology model of P. falciparum FTase [28].Furthermore, the activities toward P. falciparum FTase correlate quite well with the rat FTase inhibition (r 2 = 0.802) and, therefore, the use of the P. falciparum data of series 1 can be used to predict the activities in mammalians, and vice-versa.Hence, a second model (Model 2), which is based on the entire data set of 66 compounds as inhibitors of mammalian FTase was built, giving comparable to better statistical parameters relative to Model 1 using 6 latent variables, i.e. r 2 of 0.960 (RMSEC = 0.306), q 2 of 0.826 (RMSECV = 0.635), r 2 test of 0.869 (RMSEP = 0.540), and = 0.687.
Therefore, both Models 1 and 2, whose fitted and predicted results are depicted in Table 1 and illustrated in Figure 1, are useful and ready to Zn 2+ , FPP generated RMSD of 1.68 Å. RMSD values below 2.0 Å are expect in docking simulations when compared to crystallographic structures [31].Analysis of ethylenediaminescaffold binding mode in the FTase predict the bioactivities of new congeneric compounds.

Y randomization r 
The results obtained for modeling the FTase inhibitors has higher r 2 and lower RMSE values than those obtained for modeling the P. falciparum FTase enzyme.However, the Y randomization  obtained from modeling the FTase inhibitors is lower than that obtained from modeling the P. falciparum FTase enzyme.

r
The results obtained in this study for P. falciparum FTase enzyme are better than those reported in [19], where the number of the training set compounds is 17 while we used 47 compounds for such data set.The r 2 of calibration obtained in this study (0.887) is higher than that obtained in [19], (0.867).However, the r 2 of prediction and cross validation obtained in this study (0.536 and 0.771, respectively) are lower than those obtained in [19], (0.648 and 0.778, respectively).However, the models obtained in [19] can be overfitted due to the small size of the calibration and prediction sets.
On the other hand, the results obtained in this study for Ftase inhibitors are better than those reported in [20], in which the number of compounds in the training set is 34 while we used 53 compounds.The r 2 of calibration obtained in this study (0.960) is lower than that obtained in [20], (0.991).However, the prediction and cross validation r 2 values obtained in this study (0.869 and 0.826, respectively) are higher than those obtained in [20], (0.770 and 0.619, respectively).Consequently, the QSAR model obtained for Ftase in this study has more prediction power than that obtained in [20].
In order to find new, relevant active compounds derived from the two series of benzonitrile derivatives, substructures of the three most active compounds of series 1 (3, 8 and 22) and the two most active compounds of series 2 (31 and 47) were combined, resulting in the eight compounds of Figure 2.These compounds are suitable for activity predictions using the MIA-QSAR models built because there is no extrapolation, since all substituents and pixels were calibrated, given the 2D alignment by the congruent benzonitrile moiety.Compounds C and D exhibited relatively high, predicted activities (pIC 50  8).These values are lower than those of the most active compounds of the data set (pIC 50 > 9), but compounds C and D are promising because they are structurally different from both series of benzonitrile derivatives, although the congruent similarity center; this difference may affect e.g.resistant P. falciparum FTase.Therefore, compounds C and D were submitted to ADME analysis, which is based on the determination of theoretical parameters useful for drug-likeness assessment.
Lipinski et al. [29] have proposed a series of rules imposing limitations on logP (the logarithm of octanol/ water partition coefficient), molecular weight, and the number of hydrogen bond acceptors and most "druglike" molecules have logP ≤ 5, molecular weight ≤ 500, number of hydrogen bond acceptors ≤ 10, and number of hydrogen bond donors ≤ 5. Molecules violating more than one of these rules may have problems with bioavailability.The Lipinski's rule of five parameters and total polar surface area (TPSA), which have shown to correlate with drug absorption, were obtained by using the Molinspiration program [30].The three most active compounds of series 1 violated at least one parameter of the Lipinski´s rule (e.g. the molecular volume, which determines transport characteristics of molecules such as intestinal absorption or blood-brain barrier penetration).Moreover, compound 22 has also more than 10 hydrogen bond acceptors.On the other hand, series 2 compounds did not violate any of the limitation in the rule of five, but their experimental activity (pIC 50 ) are all lower than 7. Compounds C and D combine high activity (pIC 50  8) and good ADME profiles (Table 2), specially compound D, which did not violate any Lipinski's rule.Thus, they can be useful as antimalarial compounds or at least to drive synthesis of similar improved compounds.Improvement can be achieved by analyzing the interaction mode with the enzyme and comparing with the experimentally most active compounds of series 1, through docking studies.

Docking Studies
In order to assure that compounds proposed by MIA-SAR are really promising when compared to the exist-Q ing prototypes (compounds 3 and 8), as well as to understand their mode of interaction with FTase, docking studies for 3, 18, C and D were carried out.The potential binding sites of FTase were calculated and a cavity of 969.0A 3 (surface = 1993.0A 2 ) was observed close to Tyr365B, His362B, There is a residue, namely Tyr361β, potentially capable of providing specific cation- interaction, therefore stabilizing the complex between FTase and the four compounds.In addition, all four compounds presented a methylimidazole ring close to the zinc cation (electrostatic interactions).However, the methylimidazole ring of compounds 3 and 18 also interacted with Tyr300B in contrast to compounds C and D. This fact can, in principle, justify the high experimental potencies of the two available compounds, which are congruent with the QSAR results.However, the best results in the ADME evaluation for compound D and the necessity of new drugs against resistant strains make the proposed compounds as interesting targets for synthesis and future biological tests.The assessment of docking accuracy often called "redocking" is essentially a validation procedure to check whether the molecular docking algorithm is able to recover the crystallographic position of ligand using computer simulation.In this work, re-docking simulation of the crystal structure of ethylenediamine-scaffold in the FTase active site in complex with, Zn 2+ , FPP generated RMSD of 1.68 Å. RMSD values below 2.0 Å are expect in docking simulations when compared to crystallographic structures [31].Analysis of ethylenediaminescaffold binding mode in the FTase active site has been reported [31] and the binding pocket consists of residues Asp359β, Phe360β, Try93β, Leu96β, and W106β.Furthermore, the N-methylimidazole moieties coordinates the catalytic Zn 2+ similarly to compounds 3, 18, C and D as previously discussed.

CONCLUSION
MIA-QSAR was capable of providing predictive models which are useful to predict bioactivities of novel drug-like compounds against malaria.The proposed compounds have substructures contemplated in calibration and, therefore, the modeling of their activities is reliable.Hydrogen bond between a couple of proposed substrates, namely C and D, in addition to interactions nvolving the zinc cation present in the active site, ex-i plain the high affinity of these ligands to the FTase enzyme.After exploratory ADME evaluation, compound D is suggested as improved drug-like compound.

Figure 1 .
Figure 1.Plot of experimental vs. fitted and predicted pIC 50 of the farnesyltransferase inhibitors for Models 1 and 2.

Figure 2 .
Figure 2. Compounds proposed using the MIA-QSAR method (predicted pIC 50 values are shown in parenthesis).

3 .
Asn165α and Lys154α.The β subunit contains most of the substrate-binding residues and is partially enveloped by the crescent shaped α subunit.After docking calculations, the binding orientations of compounds 3, 18, C and D into the active site were predicted and the following parameters (see Table3) were then calculated: a) energy score values used during docking; b) total interaction energy between ligand and FTase enzyme; and c) internal energy values of the ligands.The structures of the four compounds are shown in Figure Hydrogen bonding was observed between the FTase and the four compounds: compound C interacted with Tyr365β and Tyr361β; compound D interacted with Tyr361β, Cys299β and Arg202β compounds 3 and 18 interacted with Tyr300β.

Table 2 .
Lipinski's rule of five and other parameters useful for ADME analysis b , for the most promising proposed compounds (C and D) and most active derivatives of series 1 and 2. logP = logarithm of the octanol/water partition coefficient; TPSA = topological polar surface area; natoms = number of atoms; MW = molecular weight; nON = number of hydrogen bond acceptors; nOHNH = number of hydrogen bond donors; nrotb = number of rotatable bonds; MV = molecular volume; nviolations = number of violations of the Lipinski's rule of five. b

Table 3 .
Estimated energy score values used for the evaluation of docking poses; total interaction energy values between the pose and the target molecule; and internal energy of the ligand (energies in kcal•mol −1 ).