Multidimensional QSAR Modeling of Amprenavir Derivatives as HIV-Protease Inhibitors

A computational study has been performed on a series of 55 compounds having (S)-N-(3-(N-(cyclopen-tylmethyl) substituted-phenylsulfonamido)-2-hydroxypropyl) acetamide backbone as HIV-1 protease inhibitors. Various combinations of these specific inhibitors fragments were formed by breaking them at central alicyclic single bonds, while retaining the core. Standard Topomer 3D models were automatically constructed for each fragment, and a set of steric and electrostatic fields was generated for each set of topomers. The models generated showed r of 0.811 and crossvalidated r (q) of 0.608. The other method used were Quasar and Raptor based on receptor-modelling concept (6D-QSAR) and this explicitly allows for the simulation of the induced fit, that yielded r of 0.574, cross-validated r (q) of 0.504 and predictive r (p) of 0.895 averaged over 200 models. This study has suggested the various type of substituent that can be attached to the core. The information obtained from these 3-D contour maps can be used for the design of amprenavir analogs possessing better protease inhibitory activity.


Introduction
Replication of the HIV virus requires processing of the proteins encoded by the gag and gag-pol genes by a virally encoded aspartyl protease (HIV-1 protease) [1][2][3].As such, inhibition of HIV-1 protease offers an attractive target for the treatment of acquired immunodeficiency syndrome (AIDS) [4][5][6][7][8][9].Inhibition of HIV protease is a target for drug design in a number of laboratories which has become a strategically important and therapeutically viable approach toward the control of HIV infection [10,11].Progress in the treatment of AIDS leading to an active therapy has been slow, but recent results with new AIDS drugs, notably the HIV-1 protease inhibitors (PI), allow for cautious optimism.In 2009, ten protease inhibitors have reached the market where one protease inhibitor, amprenavir, was withdrawn from the market in 2004 since fosamprenavir, its prodrug proved superior in many aspects [12].
With increased frequency and duration of treatment, however, the rate of resistance toward antiretroviral agents, including PI s, has risen alarmingly, fueling the search for next-generation drugs with broad efficacy again-st PI-resistant mutants [13].
A rational approach in this direction would be to optimize lead structure by using sufficiently rapid automatic and general QSAR analysis and application.In our present study, we have pooled in fifty five compounds from the literature on amprenavir derivatives; to generate a database for QSAR studies.This study was taken into consideration mainly to improvise on the already proven potent amprenavir molecule.The main objective of this work is to develop a useful QSAR model for PIs and also to compare two softwares-Topomer CoMFA and Biograf x suite.
Topomer CoMFA-In 2002 Richard D. Cramer introduced the technology called as Topomer CoMFA method used for generating an alignment of a structural fragment.A structural fragment by definition contains a common feature, the open valence or attachment bond.The topomer methodology overlaps this common feature to provide an absolute orientation for any fragment.A single fragment confirmation is then generated from a standardize 3D module by rule based adjustments to a cyclic single bond torsions and similarities.Previous applications of topomers then proceed to characterize and compare aligned 3D formats by steric fields and by location of pharmacophoric features.Here only 3D topomer structures themselves would be used [14].
Quasar-A quasi-atomistic receptor model refers to a three-dimensional binding-site surrogate, represented by a 3D surface surrounding a series of ligands (superimposed in their bioactive conformation) at van der Waals distance and populated with atomistic properties (H-bond donors, H-bond acceptors, H-bond flip-flop particles, salt bridges, neutral and charged hydrophobic particles, virtual solvent, void space) mapped onto it.Quasar is a 6D-QSAR tool: the fourth dimension refers to the possibility to represent each molecule by an ensemble of conformations, orientations, protonation states and tautomersthereby reducing the bias associated with the choice of the bioactive conformation; the fifth dimension refers to the possibility to consider an ensemble of different induced-fit models; the sixth dimension allows for the simultaneous evaluation of different solvation models.In addition, Quasar allows for the simulation of local induced fit, H-bond flip-flop, and various solvation effects.Ligand-receptor interactions are estimated based on a directional force field.A family of quasi-atomistic receptor models is then generated using a genetic algorithm combined with weighted cross-validation.
Raptor (Receptor as poly tier object representation) -is a receptor-modeling approach based on multi-dimensional quantitative structure-activity relationships (QSA-Rs).It aims to derive an intuitively interpretable model of a protein binding site and to accurately predict relative free energies of ligand binding.
These were divided into training: test of 49:6.The test compounds were selected manually such that the structural diversity and wide range of activity in the data set were included.QSAR analysis was carried out using Topo-mer CoMFA (Tripos Inc.) and; Quasar and Raptor (Biographic Laboratory 3R).The hardware used was Apple/Macintosh: G4 computer systems, 2GB RAM, 360GB Hard Disk, Unix operating system.For all the input structures in Biografx Suite energy minimization was performed using MM2 force field and saved as * .mol2files.Partial atomic charges were calculated using the MOPAC.
Topomer CoMFA: all the compounds in each publication were inspected visually and selected for a common core.Entry of structures was made by typing SLN representations of each topomeric fragment (as a 2D structure) into an ASCII file.After entering the structures R1 and R2 fragments were defined for the structure cutting them by alicyclic bond along the common core.The Ki values were converted to the corresponding pKi (-logKi) and used as dependent variables in the analysis.The pKi values span a range of 3-log units providing a broad and homogenous data set for 3D-QSAR study.The valencefilled structures were modeled by Concord.
There are two main phases in topomer CoMFA, first being the generation of the Topomer 3D model for each of the side chain (i.e.R1 and R2), and the second CoMFA analysis itself.Procedures for generating the topomer conformation have been detailed elsewhere [20,21], in brief:  Generating the topomer conformation involveso Attachment of structurally distinctive "cap to complete structure.
o This model is oriented to superimpose the "cap" attachment bond onto a vector fixed in Cartesian space.
o Proceeding away from this "root" attachment bond, only as required to place the most important (typically the largest) unprocessed group farthest from the root and the next most important counterclockwise relative to the largest looking along a vector pointing back to the root, stereocenters are inverted and torsion angle adjusted.
o Removal of the cap completes the topomer conformation.
 Carrying out the automatic CoMFA analyseso Atomic charges were calculated by the MMFF94.The non-bonded interaction calculation was set at cut off of 8, and dielectric constant 1.
PLS method was used to linearly correlate the CoMFA fields to biological activity values.The cross-validation was performed using leave-one-out (LOO) method in which one compound is removed from the dataset and its activity is predicted using the model derived from the rest of the molecules in the dataset.Equal weights for CoMFA were assigned to steric and electrostatic fields.The entire crossvalidated results were analyzed considering the fact that a value of q 2 above 0.2 indicates that probability of chance correlation is less than 5%.
Quasar-is based on 6D-QSAR and explicitly allows for the simulation of induced fit.Quasar generates a family of quasi-atomistic receptor surrogates that are optimized [22,23].By means of a genetic algorithm the hypothetical receptor site is characterized by a three-dimensional surface which surrounds the ligand molecules at van der Waals distance and which is populated with     atomistic properties mapped onto it.The topology of this surface mimics the three-dimensional shape of the binding site; the mapped properties represent other information of interest, such as hydrophobicity, electrostatic potential and hydrogen-bonding propensity.The fourth dimension refers to the possibility of representing each ligand molecule as an ensemble of conformations, orientations and protonation states, thereby reducing the bias in identifying the bioactive conformation and orientation (4D-QSAR).Within this ensemble, the contribution of an individual entity to the total energy is determined by a normalized Boltzmann weight.As manifestation and magnitude of induced fit may vary for different molecules binding to a target protein, the fifth dimension in Quasar allows for the simultaneous evaluation of up to six different induced-fit protocols (5D-QSAR).The six dimensions (6D-QSAR) allow the simultaneous consideration of different solvation models.This can either be achieved explicitly where parts of the surface area are mapped with solvent properties whereby position and size are optimized by the genetic algorithm, or implicitly.
Here, the solvation terms (ligand desolvation and solvent stripping) are independently scaled for each different model within the surrogate family, reflecting varying solvent accessibility of the binding pocket.Like for the fourth and fifth dimension, a modest 'evolutionary pressure' is applied to achieve convergence.The detailed methodology is given elsewhere [24,25], here it is described in brief:  Receptor surface generated by induced fit may be simulated by adapting van der Waals surface, generated around all ligands (energy minimized) in training set, to the toplology of each ligand molecule of training, test and prediction set.The associated energy was calculated from the corresponding rms values.
 The initial family of parent structure was generated by randomly populating the domains on the receptor surface with atomistic properties.
 The models then generated were evolved simulating cross-over events.At each cross-over step, therer is a small probability of (0.01 -0.02) of a transcription error, which is expressed in random mutation.Thereafter, those two individuals of the population with the highest lack-of-fit value are discarded.This process is repeated till a target r 2 (0.75 -0.95) is reached.
 The binding energy is calculated as follows: blinding ligand-receptor ligand desolvation The free energy (ΔG) of ligand binding is calculated as:  The model family is validated by their ability to predict relative free energies of ligand binding for an external set of test ligand molecules, not used during model construction.
 The model family is subjected to scramble test [26].The experimental binding data of the training set is randomly scrambled, and simulation is repeated.If under this condition, the ligands of the test set are still predicted correctly (r 2 > 0.5), the model is worthless, as it does not have sensitivity towards the biological data.
Raptor, an alternative technology [27] that explicitly and anisotropically allows for induced fit by a dual-shell representation of the receptor surrogate, mapped with physicochemical properties (hydrophobic character and hydrogen-bonding propensity) onto it.In Raptor, induced fit is not limited to steric aspects but includes the simultaneous variation of the amino-acid domains which leads to different physico-chemical fields along with it.The inner layer maps the fields that a substance would feel to fit snugly into the binding pocket, using the most potent compound; and the outer layer models the field generated by the altered binding site, the other compounds may have portions of matter located in the intrice between the two shells.The underlying scoring function for evaluating ligand-receptor interactions includes directional terms for hydrogen bonding, hydrophobicity and thereby treats solvation effects implicitly.This makes the approach independent from a partial-charge model and, as a consequence, allows to smoothly model ligand molecules binding to the receptor with different net charges.In Raptor, the binding energy is determined as follows: ΔG const is a contribution to the binding energy rationalizable as an overall loss of translational and rotational entropy of the ligand or overall gain of entropy due to desolvation of the binding pocket.f HO , f HB , f IF and f TΔS are scaling factors which are inherent to a given receptor model; they are optimized during the simulation (see below) for each specific drug target and typically constrained to specific intervals (e.g.f HO .= 0.75 − 1.25).
Raptor uses multi-step optimization protocol including domain assignment and combining tabu search with local protocol.A detailed technical aspect of it is given by Lill et al. [28].

Results and Discussion
In search of a good computational model for HIV-1 PIs we had used three computational technologies on fifty five compounds.For our work we had divided the structures into 49 training compounds and 6 test compounds.
In Topomer CoMFA studies, a series of 49 input structures were taken and fragmented along the central acyclic single bonds into 2 fragments while removing the core fragment structurally common to the entire series.Statistical analysis by PLS was done using CoMFA descriptors as independent variables and biological activity in the form of pKi values as dependent variable.Standard topomer 3D maps were automatically constructed for each fragment and a set of steric and electrostatic fields also known as contour plots and CoMFA columns were generated for each set of topomer.Figures 1 and 2 show these plots for the best R1 and R2 fragment contributions respectively.The Leave-One-Out (LOO) method of cross-validation was used for initial assessment of the predictive abilities of the models with the training sets.The optimal number of components used in the final QSAR models was that which gave the smallest standard error of prediction.Table 6 shows the cross-validated r 2 values using Topomer CoMFA analyses of protease receptor where the results depends on the template generated which gave an r 2 of 0.811 and cross-validated r 2 (q 2 ) of 0.608 with an intercept of 3.15.The graph of predicted verses experimental activity is shown in Figure 3, the linearity of plots shows good correlations for CoMFA models developed in the study for binding affinities of protease receptor.Reliability of the models was tested by prediction of 6 compounds selected as an external test set using factor analysis.The predicted and the actual activities are given in Table 7.For Quasar study, the three-dimensional structures of all ligand molecules (55 compounds in a series) were generated using Macro Model 6.5 and optimized.An extensive search was performed for representation of bioactive conformation(s), orientation(s) and protonation state(s).The molecules and their various conformers were aligned using Symposar (Figure 4) which serves as input for Quasar.In Quasar, the internal strain of a ligand is a component of the energy, which hampers the chance of "high-energy" conformer to contribute to the Boltzmann-weighted ensemble.For each conformer, MNDO / ESP charges were then calculated using MOPAC 2009.The training set was manually selected from the whole data set to obtain a maximal diversity based on the 2D substitution pattern.First, all ligands were ranked according to their experimental binding affinity.Then, the compounds were randomly divided into test and training set (49:6).
The simulation reached an r 2 of 0.574, cross-validated r 2 (q 2 ) of 0.504 and predictive r 2 (p 2 ) of 0.895 averaged over 200 models.The ratio of q 2 /r 2 was 0.877 and p 2 /q 2 was 1.77 for test set.The receptor surrogate is depicted in (Figure 5).The rms deviation of 49 ligand of training set of 1.517 kcal/mol corresponds to factor 12.5 off in the experimental Ki value.For six test compounds the predictive r 2 of 0.895 was obtained on an average the predictive binding affinity of the test deviates by 0.7 kcal/mol (2.3 factor off) from the experiment.The maximum observed deviation was 1.385 kcal/mol (9.8 factor off from the experiment).The scramble test (mean r 2 = 0.002 and q 2 = -0.348)demonstrated the sensitivity of model family towards biological data.Figure 6 experimental    8 and the comparison of cross validated r 2 with the predicted r 2 and rms etc. is shown in Figure 7.
To identify potential sites and functionalities allowing a further increase of the binding affinity, the individual functional groups of both training and test set were analyzed for their contributions toward the free energy of ligand binding, ΔG, which includes enthalpy (electrostatic,  van der Waals, H-bond, and polarization terms), entropic, salvation and induced fit contributions towards calculated binding affinity.
In Raptor study, the surrogate family included 10 independent receptor models and was evolved over 1500 optimization cycles.The simulation converged at crossvalidated r 2 of 0.94 and a predictive r 2 of 0.745.A comparison between predicted and experimental Ki value is given as a graphical representation in Figure 8 and a representation of receptor surrogate is depicted in Figure 9.
Comparison of the binding site at the true biological receptor (3EKV from PDB) Figure 10 with surrogate family receptor models obtained by multi dimensional QSAR (Figures 5 and 9) showed the similarity in their shapes.The characteristic property like hydrogen bond acceptor mimicking amino acids and H-bond donor mimicking amino acids and hydrophobic pocket etc. are well identified by model when compared to the actual receptor.The receptor and the models are bean shaped which can be best described as a predominantly hydrophobic pocket in the upper half of the region with distinct "V" shape.The hydrophilic regions, on the periphery may be characterized as H-bond acceptor rich area and also accommodates a positive salt bridge mainly in the form of terminal aromatic ring.A prominent H-bond acceptor region lies in the centre depression of the bean shaped structures of the inhibitors.The similar substitution analogy can be considered for Darunavir, sine both of them hold structure similarity.Based on the present QSAR studies obtained from Topomer CoMFA, Quasar and Raptor hypothetical binding model of these ligand molecules with HIV protease can be proposed (Figure 10).In the anionic/hydrophilic site, ligands may form hydrogen bonds with the active sites of the receptor.In the flat hydrophobic linker region, the aromatic rings may have π-π interactions with the receptor where the absolute planarity in the ligand structure is essential.This is the most important region of ligand molecules, which can be explored to design potent protease inhibitors.The large hydrophobic region/binding site may play a significant role in the selectivity of ligands over the counterparts of the protease.

Conclusions
The generation of contour plots in Topomer CoMFA studies provided significant correlation of steric and electrostatic fields with biological activity values.The good prediction of activity of the test compounds has shown that the models are useful and can be utilized for prediction of PI activity bearing similar kind of fragments.The new molecules can be designed by taking clue from the electrostatic and steric fields around the fragments.One of the advantages of topomer CoMFA is it represents identically the contributions of fragments that are structurally identical throughout a series, like a common core and it is much faster in calculations in comparison to CoMFA and CoMSIA.
The receptor modeling by Quasar and Raptor is based on 6D QSAR which explicitly allows for the simulation of induced fit and dual shell representation.To determine the ligand receptor interactions, the scoring function makes use of a directional force field.Ligand-binding free energies are then derived based on ligand receptor interactions, desolvation, entropy, internal strains, induce fit and H-bond flip-flop.The QSAR models gave good statistical results in terms of q 2 and r 2 values.
On comparing the models generated by Topomer CoMFA, Quasar and Raptor we can see similarity in r 2 and q 2 values.Thus, we can conclude that the models generated by these two technologies are similar to each other even though they vary highly conceptually.The power of the prediction lies with a low rate of falsepositive prediction.Even though these two technologies are conceptually different but they have shown the similarity in predictive powers.Both have certain pros and cons which have been described in detail by their developers in their manuals (available online) and in certain publications.The information obtained in this study provides a methodology for predicting the affinity of amprenavir related compounds for guiding structural design of novel yet potent PIs.

Figure 4 .
Figure 4. Mono view using Symposar (4D alignment) of the protease inhibitor series used for the study.

Figure 5 .
Figure 5. (a) Mono representation of the surrogates for the series of protease inhibitors under study.The mapped properties are colored as follows: pink, positive charge salt bridge; green, H-bond donor; bright yellow, H-bond acceptor; dark yellow, positively charged hydrophobic; dark brown, negatively charged hydrophobic; blue, neutral hydrophobic; (b) Mono representation of the surrogates in wire meshes and point form.

Figure 6 .
Figure 6.(a) Comparison of experimental and predicted binding affinities for the series of protease inhibitors: correct simulation; (b) Scramble test.andpredicted Ki value along with scramble test results.All the values of ΔG expt of ΔG cal and ΔΔG etc. are given in Table8and the comparison of cross validated r 2 with the predicted r 2 and rms etc. is shown in Figure7.To identify potential sites and functionalities allowing a further increase of the binding affinity, the individual functional groups of both training and test set were analyzed for their contributions toward the free energy of ligand binding, ΔG, which includes enthalpy (electrostatic,

Figure 7 .
Figure 7. Scatter plot of the model for the series of protease inhibitors under study.

Figure 8 .
Figure 8.Comparison of experimental and predicted inhibitory constant value for the series of protease inhibitors under study.

Figure 9 .
Figure 9. Dual-shell representation of a Raptor model for the series of protease inhibitors under study.The inner shell is depicted as wireframe and outer shell depicted as surface filled model.The color coding of points is same as in Figure 5.

Figure 10 .
Figure 10.Proposed substitution on Amprenavir molecule based on the study.