Pharmacophore Based Virtual Screening for Identification of Novel CDK2 Inhibitors ()
1. Introduction
Cyclin-dependent kinases (CDKs) are members of the serine/threonine kinase family and they are key enzymes in cell-cycle progression and transcription. These kinases are dependent on cyclin-binding for their activation in order to play their biological roles. However, they were shown to be frequently deregulated in different human tumors. Consequently, CDKs have received increasing attention as targets for anticancer drugs [1] [2] [3]. One of the members of CDKs which play a crucial role in orchestrating cellular transit through the cell cycle is CDK2. Therefore, CDK2 is considered as an important target for new therapeutics designed to recover control of the cell cycle [4] [5] [6].
Nowadays, there is no doubt that new advances in computer-aided drug designing have reduced the effective cost and time involved in drug discovery. However, one of the most important and extensively used tools in drug discovery process is Pharmacophore modeling which employs compound collection to generate structural patterns that should be present in active compounds. These patterns represent the set of properties and their arrangement in 3D space that an active compound must possess for it [7]. Moreover, pharmacophore modelling has been routinely used in combination with other molecular modelling techniques. Once a pharmacophore model is generated, it can be used for querying the 3D chemical database to search for potential ligands, which is so-called “pharmacophore-based virtual screening” (VS) [8]. Consequently, it can be a useful filter to narrow down the number of compounds to be docked [9] [10].
As an example of a typical such use, Nicklaus et al. used a known inhibitor of HIV-1 integrase to build a high correlation pharmacophore model, search a database, and prioritize 267 possible leads. Sixty of those were tested, and 19 were found to inhibit the enzyme [11].
Furthermore, in terms of the urgency of developing high correlation pharmacophore models for CDK inhibitors using computer-aided methods, two different ligand-based pharmacophore models for CDK inhibitors have been independently reported depending on different selected training sets (Figure 1). Hecker’s model was based on purine derivatives (activities against CDK1, CDK2) [12], Toba’s was based on indenopyrazole scaffold (activities against cyclin E/CDK2) [13].
In this study, we aim to develop high correlation pharmacophore models for CDK2 inhibitors by 3D QSAR pharmacophore generation module within discovery studio (Biovia, 2016, France) using ligands that belong to various classes of cyclin CDK2 inhibitors. After validation with test set, prediction method and Fischer randomization method, the best pharmacophore model was used as a filter to search Enamine database for compounds with similar pharmacophore features. Database compounds with best fitness scores were then subjected to molecular docking. All in all, this work has used different tools in a synergistic way in order to discover new potential molecules as leads against CDK2.
Figure 1. The ligand-based pharmacophore models of CDK2 inhibitors reported previously by (a) Hecker et al. (b) Toba et al.
2. Materials and Methods
2.1. Protein Structure Preparation
A large number of crystal structures are available on human active and inactive CDK2 in complex with small ligands which bind deeply within ATP site. However, protein complex with triazolopyrimidine inhibitor having the PDB code 2C6O and 2.1Å resolution was retrieved from the RCSB Protein Data Bank (PDB) to be used as a structural template [14].
Occasionally, the protein was prepared using general purpose protocol (Discovery Studio-Biovia 2016) by removing water molecules in all the system. Adding bond orders and formal charges to the hetero groups of the protein along with addition of hydrogens using charmM forcefield and finally minimizing energy using adopted basis Newton-Raphson algorithm (NR).
2.2. Pharmacophore Model Generation
2.2.1. Selection of Training Set Compounds
3D Quantitative Structure-Activity Relationship (3D-QSAR) pharmacophore generation is designed to use HypoGen in order to produce predictive pharmacophores from a set of ligands with known activity values against a given biological target [15].
The structures of 51 CDK2 inhibitors were obtained from literature [16] - [25] 16 diverse compounds (Figure 2) were selected for training data set basing on the principles of structural diversity and experimental activity values (IC50) spreading over four orders of magnitude (Table 1); Most active (IC50 ≤ 15 nM,
++++), active (15 < IC50 ≤ 200 nM, +++), moderately active (200 < IC50 ≤ 1000 nM, ++), very low activity IC50 > 1000 nM, +). Meanwhile, the remaining 35 molecules with known biological activites, served as the test set (Figure 3).
2.2.2. Generating and Validating 3d QSAR Pharmacophore Model
Pharmacophore prediction procedure according to HypoGen algorithm consists of three phases: a constructive, a subtractive, and an optimization phase. Occasionally, the idea of constructive phase is to generate pharmacophores that are common among the active molecules in the training set. In subtractive phase the program removes pharmacophores from the data structure that are not likely to be useful. The algorithm of optimization phase applies small perturbations to the pharmacophores created in the constructive and subtractive phases in an attempt to improve the score [17].
However, predictive pharmacophores are built by the HypoGen algorithm according to the selected data (training set of the sixteen selected molecules, 255 conformers for each compound were generated, only conformations within a 10 kcal/mol energy range were retained. In addition to that, Hydrogen bond acceptor (HBA), hydrogen bond donor (HBD), hydrophobic (HY), Hydrophobic aromatic (HA) and ring aromatic (RA) features were used to generate ten pharmacophore models, Uncertainty value was set to 3 and the minimum inter-feature distance was set to 1.5Å, all other parameters used in HypoGen module were kept at their default settings [26].
On the other hand, the generated pharmacophore models were then evaluated in order to choose the best one concerning; correlation coefficient, cost analysis and RMSD values. Furthermore, Fischer’s randomization method was performed on the training set compounds in order to validate the statistical robustness of optimized model through verification of the correlation between the chemical structures and the biological activity [27]. In this validation process, the experimental activities of the training set used in HypoGen were scrambled randomly with the same parameters as those used for generating the original pharmacophore [28] [29]. A set of 19 random spreadsheets was generated with a confidence level of 95%.
Finally, the selected pharmacophore model was further validated by test set method using 35 molecules [30]. The test set validation method was carried out using not only ligand pharmacophore mapping protocol which identifies ligands that map to a pharmacophore, and aligns them to the query, but also ligand profiler protocol which maps a set of ligands against a set of pharmacophores.
2.3. Virtual Screening
The advantage of pharmacophore-based virtual screening in the drug discovery process is that most of the compounds with low probability to be active can be excluded early from further studies [31]. However, there are multiple pharmacophore-based virtual screening algorithms with the same principle: As an input, a small molecule database and a pharmacophore model are given, then each of
the molecules conformations is fitted to the model. As result, a hit list, fitting compounds, are given as output [10].
Enamine Kinase-Hinge-Region-Directed-library which consists of 18,020 compounds was screened using Ligand Pharmacophore Mapping protocol and screen library protocol in order to search novel compounds which have the generated pharmacophore. Besides, retrieved compounds were filtered by applying Lipinski’s rule of five (i.e., a molecule with a molecular mass less than 500 Da, no more than 5 hydrogen bond donors, no more than 10 hydrogen bond acceptors, and an octanol-water partition coefficient log P not greater than 5), and further sorted on the basis of fit value.
2.4. Docking
The selected compounds from Enamine library were docked into the active site of the previously prepared CDK2 structure (2C60) with CDOCKER protocol implemented in Biovia Discovery Studio 2016. CDOCKER uses a CHARMm-based molecular dynamics (MD) scheme to dock ligands into a rigid receptor binding site. First, Random ligand conformations are generated using high-temperature MD. Second, the conformations are translated into the binding site. Third, Candidate poses are created using random rigid-body rotations followed by simulated annealing. A final minimization is then used to refine the ligand pose [32].
Moreover, the performance of the docking method on CDK inhibitors had been evaluated before it was carried out. The validation was achieved by measuring the RMSD between the orientation of the ligand existing in the crystal complex (2C6O) and its orientation found by our study. The RMSD value must be less than 2Å. However, CDOCKER interaction energy predicts the binding energies of the protein with retrieved hits [33]. The proximity between the receptor and hit was studied in terms of the electrostatic, hydrogen bond interactions and hydrophobic interaction.
3. Results and Discussion
3.1. Pharmacophore Modeling and Validation
A set of ten pharmacophore models was generated based on a training set containing 16 structurally diverse compounds (Table 2).
In the present work, we will discuss the first pharmacophore model (Hypothesis 1) which is basically composed of three features: a hydrogen bond acceptor, and two aromatic ring features (Figure 4).
The first pharmacophore model was considered the best model for so many reasons. For instance, in terms of hypothesis significance, what really matters is the magnitude of the difference between the cost of any returned hypothesis and the cost of the null hypothesis. By the way, cost analysis shows a null cost value of 118.527 and Hypo1 cost value of 73.896. So, the cost difference for it was 44.631. A cost difference value above 40 implies that the pharmacophore model correlates the estimated and experimental activity values between (75% - 90%).
Table 2. Generated Pharmacophore Models.
Besides, total cost value should be away from the null cost and close to the fixed cost. However, among the total cost values of ten pharmacophore models, Hypo1 scored the closest value to the fixed cost value (63.187) than other models. Therefore, Hypo1 could be considered as a good model. In addition to the cost analysis of the models, hypothesis 1 was further assessed based on many factors. First, the correlation value of the first pharmacophore model was 0.906. Second, estimated activities of all ligands, even the lowest active one, were within one order of magnitude of the actual activity (Table 3). Hence, it is considered as an acceptable prediction of activity. The predictive ability of Hypo 1 on training set compounds is shown in Figure 5.
Third, the alignment of the training set ligands to the pharmacophore was taken into consideration in terms of its ability to explain the activity of the ligands. For instance, active ligands should map more features than inactive ligands. In other words, a ligand is inactive because it does not have an important feature, or the feature is present but cannot be oriented correctly in space. Occasionally, a general guideline for evaluating 3D QSAR pharmacophores is that the two most active ligands should map to all features of the pharmacophore.
Table 3. Training set aligned to the highest ranked Pharmacophore (hypo1).
Figure 5. The correlation graph between experimental and estimated activity values based on hypo1.
Table 3 summarizes the training set aligned to the first pharmacophore model and illustrates that the three most active ligands have mapped to all features of the pharmacophore. Figure 6 illustrates the most active compound (2R3Q) mapped to the features of hypo1.
Forth, the result of Fischer’s randomization test confirmed the statistical confidence of Hypo1 and indicated that none of the randomly generated pharmacophore models obtained from this validation method scored better statistical values than Hypo1.
Finally, a test set of 35 ligands with known biological activity was used in order to determine the best models by assessing their ability to estimate the activity of a test set. Consequently, 24 compounds were mapped to the proposed pharmacophore (Table 4), the other 11 compounds which have not been mapped had activities over 1000 nm. Figure 7 represents the most active ligand of test set (2A0C) mapped to hypo1.
Thus, the hypothesis 1 is considered to be a good model to estimate the activity of new compounds. Moreover, heat map plot maps test set ligands to the set of generated pharmacophores by 3D QSAR. It represents fit values in a two-dimensional color map. As they are shown in (Figure 8), blue rectangles
Table 4. Mapped molecules of test set with hypo1.
Figure 8. Heat map plot of test set with pharmacophore models.
which simulate low fit values are less in hypo 1 than the other nine hypotheses, which indicates beside the other statistical validation indicators that virtual screening can be carried out basing on hypo 1.
3.2. Virtual Screening
There is no doubt that pharmacophore models are excellent tools for scaffold hopping and identifying structurally diverse compounds as hits by using them as a filter in virtual screening. Eventually, the validated pharmacophore model (hypothesis1) has been used as 3Dimensional structural query to screen Enamine Kinase-Hinge-Region-Directed-Library of 18,020 compounds which has been also gone under Lipiniski filtration in order to evaluate their druglikeness and determine if they have chemical and physical properties that would make it a likely orally active drug in humans.
However, when a compound from the database will be considered as a hit, two criterions are employed, it needs to match to all of the pharmacophore features and to have high fit value. In other words, since the Fit Value is based on the actual distances of the conformation from the pharmacophore feature centers, the better fit gives a higher score and directly describes how well the compound fits to the model. As a result, 69 hits were retrieved according to the Fit values over 5.8 and to the Lipiniski filtration results.
3.3. Molecular Docking
The crystal structure of CDK2 from the protein data bank (PDB ID: 2C6O) was selected and prepared for docking studies. Besides, the performance of the docking method on CDK inhibitors was evaluated by re-docking crystal ligand with less than 2 RMSD value. 69 hits with the highest fit value (over 5.8) were docked into the active site of CDK2 using CDocker. As a result, compound 5909 (Figure 9(a)), compound 701 (Figure 9(b)) and compound 8397 (Figure 9(c)) showed very good binding modes with -CDocker interaction energy of 54.519, 52.665 and 48.94, respectively.
However, compound 5909 formed many significant interactions with the key binding amino acid residues of CDK2 (Figure 10(a)). For instance, N atom
Figure 9. Hit compounds (a) compound 5909; (b) compound 701; (c) compound 8397.
(a) (b) (c)
Figure 10. Interaction of HTS hit compounds with CDK2: The result of docking study of CDK2 with (a) is hit 5909; (b) indicates hit 701; (c) shows 8397. The residues in the hinge-region ATP site are shown in blue. Schematic representation of hydrogen bonding (greenish dotted lines) and electrostatic interactions (pi-cation and pi-anion) in dotted orange lines, hydrophobic Pi-alkyl in dotted purple lines.
(which functions as a Hydrogen bond acceptor) in pyrimidine ring showed a hydrogen bond reaction with LEU83 (particularly with NH), as well as H atom (HBA) which is located between the two Nitrogens of pyrimidine ring interacted with the carbonyl group (HBD) of GLU81 by a hydrogen bond reaction. H atom in the side chain formed a hydrogen bond with ASP145 which is located in ribose region of ATP and which also interacted with the centroid of Pi ring of imidazole by a Pi-anion interaction. Besides, a Pi-cation interaction was formed between the electrons of a delocalized Pi system of imidazole in the side chain of the molecule and the positively charged Nitrogen in LYS33 side chain (in triphosphate region of ATP). Finally, pyrmidine ring formed a hydrophobic Pi-alkyl interaction bond with each of VAL18, ALA31 and ILE10. Meanwhile, Il10 also interacted with each of thiophene ring and phenyl ring by a Pi-alkyl bond.
701 is also one of the compounds which were marked as hits (Figure 10(b)). Occasionally, S atom in thiophen ring showed a hydrogen bond reaction with NH of LEU83. Another hydrogen bond was formed between the side chain and ASP145. Furthermore, there were hydrophobic Pi-alkyl interaction bonds between pyrimidine ring and each of VAL18 and ALA31, and between each of thiophene rings, and phenyl ring with ILE10.
The third hit in our study was compound 8397 (Figure 10(c)): Each of N atom (which functions as an HBD) in pyridine ring and NH (HBA) in pyrazole ring showed a hydrogen bond reaction with LEU83 (with NH and carbonyl group, respectively). Thiadiazole ring in the side chain of the compound interacted with Lys33 by an electrostatic Pi-cation bond. Besides, hydrophobic Pi-alkyl interaction bonds were formed by pyridine ring with each of VAL18, ALA31 and ILE10, pyrazole ring with IL10, and methyl group which is substituted on pyridine with PHE80. Moreover, substituted methyl groups on pyridine and pyrazole showed hydrophobic alkyl bonds with each of ALA31 and IL10, respectively.
All in All, binding modes and cdocker interaction energy values of compounds 5909, 701 and 8397 suggest them as good leads in order to design novel CDK2 inhibitors.
4. Conclusions
In the present work, a highly correlating (r = 0.906) pharmacophore model (hypo1) containing one hydrogen bond acceptor, and two aromatic ring features was developed according to a group of compounds with known activity values (training set). In addition to its correlation factor, the generated pharmacophore approved to have a strong significance due to its good statistical values like total cost, fit values, RMSD. On the other hand, the model was further validated by a test set prediction method and Fischer randomization method.
However, this validated pharmacophore model was used for database screening (Enamine kinase hinge region directed library) in order to identify compounds which can be used in developing potent CDK2 inhibitors as proposed anti-cancer drugs. Furthermore, molecular docking study which was carried out for the 69 retrieved hits from the virtual screening study showed that the CDOCKER interaction energy values of the compounds 5909, 701 and 8397 of Enamine library and their ability to form bond interactions with the key binding elements of the protein (LEU83, GLU81, ASP145, LYS 33, IL10) suggest these compounds to be good probable leads against CDK2.