Identification of Novel CDK9 Inhibitors with Better Inhibitory Activity and Higher Selectivity for Cancer Treatment by an Effective Two-Stage Virtual Screening Strategy

The aberrant overexpression of cyclin-dependent kinase 9 (CDK9) in cancer cells results in the loss of proliferative control, making it an attractive therapeutic target for various cancers. However, the highly structural similarity between CDK9 and CDK2 makes the development of novel selective CDK9 inhibitors a challenging task and thus limits their clinical applications. Here, an effective two-stage virtual screening strategy was developed to identify novel CDK9 inhibitors with better inhibitory activity and higher selectivity. The first screening stage aims to select potential compounds with better inhibitory activity than Roniciclib, one of the most effective CDK9 inhibitors, through reliable structure-based phar-macophoric virtual screening and accurate molecular docking analyses. The second stage employs a very detailed visual inspection process, in solvation energy and/or reducing polar solvation energy can significantly improve the binding affinity of these CDK9 inhibitors. Their clinical potentials to serve as anticancer drug candidates can be further evaluated through a series of in vitro/in vivo bioassays in the future. To the best of our knowledge, this is the first attempt to identify novel CDK9 inhibitors with both better inhibitory activity and higher selectivity through an effective two-stage virtual screening strategy.


INTRODUCTION
Cyclin-dependent kinases (CDKs) are serine/threonine kinases involved in several critical cellular processes, such as cell cycle or transcription, and their activities require association with specific cyclin subunits, such as cyclin A, cyclin D, and cyclin T, etc. [1]. CDKs can be divided into two major groups: CDK1, CDK2, CDK4, and CDK6 are directly involved in cell cycle regulation, while CDK8, CDK9, CDK12, and CDK13 have been implicated in the control of gene transcription [2]. However, CDK7 is both a CDK-activating kinase and a component of the general transcription factor II H (TFIIH), thus it cannot be classified into these two groups [3]. Recent studies have indicated that cancer initiation and progression are associated with dysregulated CDKs, which are frequently overexpressed in many cancer cells [4,5]. Hence, the development of CDK inhibitors has been considered as an effective approach for cancer treatment in recent years [5]. However, several studies have indicated that the function of the cell cycle control CDKs is usually replaced by another CDKs or other protein kinases, resulting in the specific CDK inhibitors unable to prevent cancer cell growth [6,7]. On the other hand, among the gene transcription controlling CDKs, CDK9 appears to be the most potential candidate for cancer treatment due to the fact that it is aberrantly expressed in various types of cancers [8], including leukemia, cervical cancer, prostate cancer, glioblastoma, breast cancer, melanoma, and lung cancer [2]. Therefore, the development of highly effective CDK9 inhibitors has driven a lot of attention in current approaches for cancer treatment.
Recently, many promising CDK9 inhibitors with strong inhibitory activity, such as Flavopiridol, Roniciclib, TG02 and CDKI-71, have been successfully developed [9,10]. Among them, Flavopiridol and Roniciclib are the most widely studied and highly effective CDK9 inhibitors used for anticancer effects to date [11,12]. Flavopiridol, the first potent CDK inhibitor used in clinical trials, exhibits great activity toward CDK9 (IC 50 = 20 nM) and other CDK family members (IC 50 = 20 -40 nM), including CDK1, CDK2, and CDK4 [13]. A recent study has shown that Flavopiridol is able to reduce survival in several tumor cell lines at nano-molar concentrations [14]. Roniciclib is another potent CDK inhibitor that has been shown to exhibit nano-molar activity against most CDKs and strongly reduce tumor growth in xenograft models [10]. Although they have shown good anticancer effects both in vivo and in vitro, none of them has received final approval as a therapeutic option after clinical trials [15]. The problem these CDK9 inhibitors have in common is that they also inhibit other cell cycle CDKs, in particular CDK2; resulting in a negative effect on normal cell cycle progression and leading to several severe side effects, including diarrhea, nausea, and tumor lysis syndrome [16]. To solve this problem, several selective CDK9 inhibitors, such as LDC000067 and CAN508, have been developed. However, their inhibitory effects are comparatively lower than that of the current CDK9 inhibitors [17]. Therefore, there is an urgent need to discover novel CDK9 inhibitors with better inhibitory activity and higher selectivity against CDK2 for cancer treatment.
Developing highly selective CDK9 inhibitors is extremely challenging due to the fact that the structure of CDK9 in the ATP binding pocket is very similar to that of CDK2 [18]. Previous study has revealed that the ATP binding pocket of CDKs can be divided into three main binding regions: the hinge region, the glycine-rich loop and the activation loop [19]. Among them, the hinge region is the most important region to interact with inhibitors, while the interaction of the glycine-rich loop is regarded as the basis for evaluating the selectivity among CDKs. Most CDK9 inhibitors are ATP-competitive inhibitors and they are able to accommodate in the ATP binding pocket by forming interactions with these three regions and thus prevent ATP from entering the binding pocket [20]. In 2012, the crystal structures of CAN508 bound to CDK9 and CDK2 were determined, revealing that the reason for CAN508 selectively bound toward CDK9 rather than CDK2 is possibly attributed to the overall flexibility of CDK9, particularly with respect to the down-moving glycine-rich loop and the more extended ATP binding pocket [21]. Other studies also observed that the large substituent of several inhibitors can be accommodated within the ATP binding pocket of CDK9; but when bound to CDK2, they often extend out from the activation loop of the CDK2 binding pocket, making them difficult to form essential interactions with the glycine-rich loop and the activation loop [22,23]. The above valuable information has made the structure-based discovery of novel CDK9 inhibitors with distinguished selectivity against CDK2 possible.
In order to discover novel CDK9 inhibitors with better inhibitory activity and higher selectivity, a rapid and effective two-stage screening strategy was established in this study. In the first stage, a highly accurate structure-based pharmacophore model was constructed based on the docked Roniciclib/CDK9/ cyclin T complex and was further validated and utilized in the subsequent virtual screening to selected compounds with the same pharmacophore features of Roniciclib from the National Cancer Institute (NCI) and Traditional Chinese Medicine (TCM) databases. Molecular docking was further performed to evaluate the inhibitory activities of the selected compounds. The compounds with better inhibitory activities than Roniciclib were then subjected to the second stage to identify compounds with higher selectivity than CAN508. In this stage, visual inspection of the docking poses was first carried out to identify compounds able to form stable interactions with the highly flexible glycine-rich loop (residues 25 -35) of CDK9 but not with that (residues 10 -17) of CDK2, which is the most important structural difference to distinguish the selectivity between CDK9 and CDK2. Furthermore, only compounds, which are able to fit perfectly into the activation loop of CDK9 but extending out of the activation loop of CDK2, were determined as potential selective CDK9 inhibitors. Then, various 50-ns molecular dynamics (MD) simulations were conducted to examine the binding stabilities of these potential selective CDK9 inhibitors. Meanwhile, binding free energies of these potential selective CDK9 inhibitors were calculated using Molecular Mechanics Poisson-Boltzmann Surface Area Mechanics (MM-PBSA) method for affinity analysis. Finally, three compounds, NCI207113 from NCI database and TCM0004 and TCM3282 from TCM database, were successfully identified as potential CDK9 inhibitors with better inhibitory activity and higher selectivity through our two-stage screening strategy and their clinical potentials for cancer treatments would be further evaluated by a series of in vitro and/or in vivo bioassays in the future.

Receptor and Ligand Preparation
The crystal structures of Flavopiridol/CDK9/cyclin T (PDB ID: 3BLR) [24] and CAN508/CDK2/cyclin A (PDB ID: 3TNW) [25] were downloaded directly from the RCSB Protein Data Bank [26]. The water molecules and non-interacting ions were removed and the incomplete residues were fixed and refined using the Prepare Protein protocol embedded in BIOVIA Discovery Studio 2017 (DS 2017).
On the other hand, a group of 24 known CDK9 inhibitors, characterized by their excellent experimental performances (IC 50 ≤ 30 nM) (Figure 1), were collected from previous literatures [9,27]. Meanwhile, a total of 489 inactive (decoy) compounds were generated from the online server Database, DUD•E [28]. A diverse set of the maximum of 255 different conformers was generated for each active and decoy compound using the Prepare Ligand program of DS 2017 for further pharmacophore hypothesis validation.

Pharmacophore Model Generation and Validation
The crystal structure of Flavopiridol/CDK9/cyclin T was first adopted to generate a total of 10 pharmacophore hypotheses with 4 -6 pharmacophore features by the Receptor-Ligand Pharmacophore Generation module of DS 2017. The Güner-Henry (GH) scoring method was used to evaluate the ability of Figure 1. Chemical structures of the 24 known CDK9 inhibitors selected from the previous literatures [9,27]. These CDK9 inhibitors were ranked according to their IC 50 values shown in parentheses from the lowest to the highest. these 10 pharmacophore hypotheses to retrieve active compounds from the dataset consisting of 24 active and 489 decoy compounds [29]. However, even Hypo_1, the best pharmacophore hypothesis with the highest GH score, did not show enough ability to distinguish the active compounds from the inactive ones. J. Biomedical Science and Engineering To conquer this problem, the docked Roniciclib/CDK9/cyclin T complex was constructed and used as an alternative for pharmacophore hypotheses generation.
First, the co-crystal ligand, Flavopiridol, was removed from and then re-docked into the binding site of Flavopiridol/CDK9/cyclin T using CDOCKER protocol. The excellent agreement of the docked complex relative to the crystal structure reflects the accuracy of the docking algorithm. Then, Roniciclib was docked into the apo CDK9/cyclin T structure and the optimum Roniciclib/CDK9/cyclin T conformation was used to construct another 10 pharmacophore hypotheses. After GH scoring analysis, Hypo_2, the best pharmacophore hypothesis with the highest GH score, was chosen as the 3D query in the following virtual screening process.

Virtual Screening
Structure-based virtual screening was performed to retrieve potential compounds with better CDK9 inhibitory activity than Roniciclib from two large databases: NCI database with cancer drug candidates and TCM database containing abundant Chinese herbal medicines and natural compounds. The overall flowchart of the two-stage screening strategy developed in this study is shown in Figure 2. These two databases were initially filtered by the Lipinski's rule of five and Veber's drug-likeness rule that sets the criteria for drug-like properties. Then, Hypo_2 was utilized as the 3D query to select potential CDK9 inhibitors from these drug-like compounds by engaging the Screen Library protocol with default parameters. The compounds that satisfied the pharmacophore features and the excluded volumes of Hypo_2 were selected and subsequently subjected to the following molecular docking experiments.

Molecular Docking
Molecular docking was carried out to evaluate the interactions and clarify the binding configurations of the selected compounds using the CDOCKER protocol of DS 2017. This docking program adopts CHARMm force field to minimize the energy of the ligands. Most of the setting was retained for scoring parameters, except for the configuration parameter, which was set to "best". The CDOCKER interaction energies obtained from the optimal configuration of each selected compound were utilized to evaluate the inhibitory activity toward CDK9. CDOCKER interaction energy signifies the interaction between protein and compound [30], hence the compounds with lower CDOCKER interaction energies toward CDK9 than that of Roniciclib were considered to exhibit higher inhibitory activity against CDK9. Then, these selected compounds were also docked into the binding site of CDK2/cyclin A to evaluate their selectivity between CDK9 and CDK2 in the second screening stage.

Visual Inspection
Visual inspection was utilized to investigate the interactions between the selected compounds and the residues within the ATP binding pockets of CDK9 and CDK2, respectively. Recent studies have indicated that several residues of CDK9 (Ile25, Phe30 and Val33) and CDK2 (Thr14 and Tyr15) on the glycine-rich loop play an important role in inhibitor binding [24,31]. In addition, the large substituents of several selective CDK9 inhibitors can be accommodated within the ATP binding pocket of CDK9; but when bound to CDK2, they often extend out from the activation loop of the CDK2 binding pocket. However, the smaller selective CDK9 inhibitor, CAN508, which does not include a relatively large functional group, is not able to show the apparent binding mode that the functional group extends out of the CDK2 binding pocket from the activation loop. Therefore, another selective CDK9 inhibitor, LDC000067, which has a large functional group, was utilized to show the position and direction of the compound extending out of the CDK2 binding pocket ( Figure 3) [22]. Therefore, the compounds that fulfill the following visual (a) (b) Figure 3. The 3D structures of the selective CDK9 inhibitor, LDC000067, in the ATP binding pocket of (a) CDK9/cyclin T and (b) CDK2/cyclin A. The ATP binding pocket is made up of the hinge region, the glycine-rich loop and the activation loop shown as blue, cyan and green ribbons, respectively. In addition, the blue and brown color shading of the surface represents the hydrophilic and hydrophobic interactions formed by LDC000067 in the binding pockets. The position where the LDC000067 extends out from the activation loop of the CDK2 binding pocket is indicated by the red circle.
inspection criteria were considered as potential selective CDK9 inhibitors: 1) able to form stable interactions with all the key residues (Ile25, Phe30, and Val33) of CDK9 but not with any of the key residues (Thr14 and Tyr15) of CDK2, and 2) able to be accommodated within the ATP binding pocket of CDK9 but extended out from the activation loop of the CDK2 binding pocket.

Molecular Dynamics (MD) Simulations
Various 50-ns MD simulations were conducted to evaluate the binding stabilities of the selected compounds using GROMACS 2016.3 with AMBER force field. Each of the complexes obtained from previous molecular docking and visual inspection processes were solvated in a 10.0 Å dodecahedron periodic water box with 0.15 M salt concentration as a physiological state. All of the systems were minimized and equilibrated in NPT and NVT steps for 200 ps. Each of the MD simulation run was performed with a time step of 2 fs. The stability of each system was assessed by computing the root mean square deviation (RMSD) over the entire simulation period. Only the compounds showing stable binding during the entire MD simulation run with convergent RMSD value and no major fluctuations [32] were further subjected to the following binding free energy calculations.

Binding Free Energy Calculations
Binding free energy (ΔG bind ) calculations were performed by the widely applicable Molecular Mechanics Poisson-Boltzmann Surface Area Mechanics (MM-PBSA) method [33]. All energy components, including van der Waals, electrostatic, polar solvation, and nonpolar solvation contributions, were calculated using the 1000 snapshots extracted from the last 10 ns of the MD trajectories. Previous studies have suggested that the binding free energy calculated by MD simulation coupled with MM-PBSA can reproduce the trend of binding affinities of the inhibitors under physiological conditions [33,34]. In this study, the compounds with lower binding free energies toward CDK9 than Roniciclib were finally confirmed as potential CDK9 inhibitors with better inhibitory activity and higher selectivity.

ADMET Prediction
Poor ADMET properties of the lead compounds usually cause failure in the drug discovery phases. In this study, admetSAR server were used to predict the ADMET properties, including Carcinogenicity, Cytochrome P450 2D6 (CYP2D6), blood-brain barrier (BBB) permeability, plasma protein binding (PPB) and human intestinal absorption (HIA) of the selected compounds. CYP2D6, a member of the cytochrome P450 mixed-function oxidase system, is responsible for the metabolism and elimination of drugs [35]. The BBB property, a semipermeable border of endothelial cells that avoids drugs in the circulating blood crossing into the human central nervous system (CNS), has become a crucial criterion in drug development [36] PPB, which indicates the extent of drug bound to plasma, is the property that must be paid attention in drug development. The extent of protein binding may impact the efficacy and toxicology of a drug since only the free drug can elicit a pharmacological response [37]. HIA refers to the process through which oral drugs are absorbed from the human digestive system into the bloodstream [38]. The above properties are all important measures for the potential safety and pharmacological activity of the selected compounds, especially for those used for cancer treatment.

RESULTS AND DISCUSSION
Ever since CDK9 has been proposed to be a promising target for developing anti-cancer drugs, a large number of CDK9 inhibitors have been successfully developed in recent years [9,10]. However, the first generation CDK9 inhibitors, such as Roniciclib and Flavopiridol, have been shown to develop several severe side effects in patients during their clinical trials due to the fact that they also inhibit other CDKs, particularly CDK2. Although some second generation CDK9 inhibitors with improved selectivity against CDK2, such as CAN508 and LDC000067, have been developed, their inhibitory activities were insufficient to be considered as potential drugs for cancer treatment [17]. Therefore, this study aims to discover novel CDK9 inhibitors with better inhibitory activity than Roniciclib and higher selectivity than CAN508 through a novel two-stage screening strategy.

Structure-Based Pharmacophore Model Generation and Validation
Structure-based pharmacophore modeling has been proven to be a successful tool in identifying similar active compounds by investigating the important interactions between ligands and receptors [39]. In this study, two pharmacophore hypotheses, Hypo_1 (Figure 4(a)) and Hypo_2 (Figure 4(b)), was constructed based on the crystal structure of Flavopirido/CDK9/cyclin T and the docked Roniciclib/CDK9/ cyclin T complex, respectively. Hypo_1 consists of four pharmacophore features: one hydrogen bond donor, one hydrogen bond acceptor, one hydrophobic feature and one positive ionizable feature ( Figure  4(a)). The hydrogen bond donors and acceptor of Hypo_1 correspond to the hydrogen bonds formed between Flavopiridol and the hinge region of CDK9; while the hydrophobic features of Hypo_1 map the hydrophobic regions in the glycine-rich loop of CDK9. In particular, the positive ionizable feature of Hypo_1 is located in the center of the ATP binding pocket. On the other hand, Hypo_2 consists of two hydrogen bond donors and two hydrophobic features (Figure 4(b)). The two hydrogen bond donors of Hypo_2 correspond to the hydrogen bonds formed between Roniciclib and the hinge region of CDK9; while the two hydrophobic features of Hypo_2 map the hydrophobic regions in the glycine-rich loop of CDK9. These pharmacophore features highlight the importance of the hinge region and the glycine-rich loop in the ATP binding pocket of CDK9 for inhibitor binding.
GH scoring method has been successfully employed for quantification of pharmacophore model quality and coverage of activity space from database investigation and for the evaluation of the accuracy of similarity search in databases containing both biological and structural activity data [40]. GH scoring contains a coefficient to penalize excessive hit list range. This coefficient is calibrated by weighting the score with respect to the yield and coverage when evaluating hit lists [29]. The GH score ranges from 0 to 1, with 0 indicating a null model and 1 a perfect model. Basically, GH score of 0.7 can be considered as a corresponding metric as it takes into account both the percent yield of actives in a database and the percent ratio  of actives in the hit list [38]. Here, the quality of Hypo_1 and Hypo_2 was assessed by GH scoring method using a dataset consisting of 24 known actives obtained from previous literatures [9,27] and 489 decoys constructed by the DUD•E server [28]. Several parameters, such as total hits (Ht), active hits (Ha) and GH score of Hypo_1 and Hypo_2 were calculated and listed in Table 1. Hypo_1 recognizes 22 compounds (Ht) from the database consisting of 513 compounds; however, only 9 active inhibitors were identified with 13 decoys. The GH score of Hypo_1 is only 0.390, indicating that Hypo_1 does not exhibit the ability in selecting the active compounds among inactive ones. On the other hand, Hypo_2 successfully screened out 15 compounds (Ht) from 513 compounds in the database, 14 of which were active inhibitors (Ha). It should be noted that the Hypo_2 exhibited a good GH score of 0.844 and mapped only one decoy, revealing that it rigorously filters the active compounds and reduce a large number of false positives in the validation process.
The main difference between Hypo_1 and Hypo_2 is the number of hydrophobic features on the glycine-rich loop and the positive ionizable feature. The two hydrophobic features of Hypo_2 correspond to the two hydrophobic interactions with the two critical residues Ile25 and Val33 on the glycine-rich loop for Roniciclib (Figure 4(d)). In contrast, Flavopiridol only forms one hydrophobic interaction with Ile25 ( Figure 4(c)), resulting in forming only one hydrophobic feature in Hypo_1. Besides, the positive ionizable feature of Hypo_1 may not be a required feature because it is located in the center of the ATP binding pocket rather than on the three important regions, the hinge region, the glycine-rich loop and the activation loop. Thus, Hypo_2 was adopted as the 3D query in the following virtual screening to identify compounds with better inhibitory activity than Roniciclib.

Virtual Screening
Virtual screening is a technique commonly used to identify active compounds from large chemical databases by applying information about the bioactive ligands (ligand-based) or receptor-ligand complex (structure-based) [41]. Comparing to ligand-based virtual screening, structure-based one considers more about the chemical features of the active site of the receptor and the relationships of the 3D structure between receptor and ligand. In this study, a total of 325,805 compounds from the NCI and TCM databases were first filtered by Lipinski's rules of five and Veber's drug similarity rules to evaluate their drug-like ability, resulting in 233,818 compounds with good physicochemical properties similar to those of oral drugs. Hypo_2 was then employed as a 3D query to screen these drug-like compounds and 10,022 compounds, which map well with all the required pharmacophore features of Hypo_2, were retrieved. The advantage of pharmacophore-based virtual screening is that it selects compounds from large databases by fitting the compound structures with the pharmacophore features without considering the real interaction between ligand and target, thereby greatly reducing the time required for initial screening [42]. Here, the well-validated pharmacophore model, Hypo_2, is able to rapidly and effectively exclude over 95% of compounds that are inactive for CDK9 from NCI and TCM databases. Finally, these compounds that fitted well with the features of Hypo_2 were further subjected to the following molecular docking experiments to evaluate their inhibitory activities toward CDK9.

Molecular Docking
Molecular docking has been intensively used to predict the interactions and binding modes of compounds in the receptor active sites. The CDOCKER algorism, a flexible and rigorous docking method, is considered to be more accurate than most of the existing docking programs and its calculated energy can accurately signify the energy of the interaction that exists between the receptor and the ligand [30]. Besides, a lower CDOCKER interaction energy value usually implies greater favorable binding between the receptor and the ligand. In this study, all of the 10,022 compounds selected from virtual screening were docked into the CDK9 binding pocket using CDOCKER algorism and a total of 1153 compounds were successfully confirmed to possess better inhibitory activity towards CDK9 due to their lower CDOCKER interaction energies compared to Roniciclib (−48.86 kcal/mol). Although several previous studies have provided various strategies for the development of many effective CDK9 inhibitors, very few selective CDK9 inhibitors have been successfully identified due to the limitation of the molecular docking methods, in which the quantification of ligand-receptor interaction is based on a simplistic scoring function, which is manifested by inaccuracies in affinity ranking for two structurally similar receptors and typically yields poor performance in the prediction of selective compounds [43]. Hence, these 1153 compounds were further submitted to the second screening stage to determine their selectivity between CDK9 and CDK2.

Visual Inspection
In order to evaluate the selectivity of the 1153 compounds retrieved from the first screening stage, visual inspection was performed to analyze their binding modes and interactions with the key residues within the ATP binding pockets of CDK9 and CDK2, respectively. In recent years, visual inspection has been applied in a vast number of drug discovery projects to evaluate the affinity and selectivity of receptor-ligand complex [44]. The most frequently assessed criteria in the visual inspection are the binding mode and interaction of ligand in the binding sites of difference receptors, which is of great importance for successful identification of selective compounds in drug discovery. Based on the structural insights describing the major differences between CDK9 and CDK2 towards ligand binding from previous studies, a set of screening criteria was listed as follows in our visual inspection to identify compounds with higher selectivity toward CDK9 than CAN508 [22,24,31]: 1) First, the selective compounds must interact with all of the key residues of CDK9 (Ile25, Phe30 and Val33) but not with any of the key residues of CDK2 (Thr14 and Tyr15). 2) Second, the selective compounds must be accommodated perfectly in the activation loop of the CDK9 binding pocket but extended out from the activation loop of the CDK2 binding pocket. After conducting visual inspection carefully based on the above criteria, only three compounds, NCI207113 from NCI database and TCM0004 and TCM3282 from TCM database, were finally identified as potential CDK9 inhibitors with higher selectivity than CAN508.
The interactions of these three compounds in the binding pockets of CDK9 and CDK2 are shown in Figure 5. These three compounds form stable complexes with CDK9 through several interactions with the three major regions in the binding pocket, such as Cys106 of the hinge region, Ile25, Phe30 and Val33 of the glycine-rich loop, and Ala166 and Asp167 of the activation loop. In contrast, these three compounds Figure 5. The best binding modes of (a) NCI207113, (b) TCM0004, (c) TCM3282, (d) CAN508 within the binding pocket of CDK9 and CDK2 shown in 3D interaction diagrams. The crucial regions, including the hinge region, the glycine-rich loop and the activation loop are shown as blue, cyan and green ribbons, respectively. The interacting residues are also labeled. Hydrogen bond was depicted as the green dash line and hydrophobic interaction was depicted as the pink dash line. The unfavorable bump is shown in red dash line.
are poorly bound to CDK2 since they only form interactions with the hinge region and the activation loop. These compounds did not form any interaction in the glycine-rich loop, which may cause them to escape from the CDK2 binding pocket through the activation loop. As can be seen from Figures 5(a)-(c), NCI207113, TCM0004 and TCM3282 are moving towards the activation loop from the glycine-rich loop when bound to CDK2, making them extend out from the activation loop of the CDK2 binding pocket more easily. CAN508 also exhibits a very similar binding mode in CDK2, which can explain its great selectivity toward CDK9 than CDK2 even without the substitution of any large functional group. To gain insights into the selectivity of the three identified compounds toward CDK9 than CDK2, their binding modes in the well-defined hydrophobic binding pocket were investigated through their 3D structures. As shown in Figure 6, NCI207113, TCM0004 and TCM3282 are able to fit perfectly into the binding pocket of CDK9, while they cannot be completely accommodated in the binding pocket of CDK2. In addition, the functional groups of these compounds extending out of the CDK2 binding pockets are all large hydrophobic groups, which make them not able to form interactions with the glycine-rich loop since the CDK2 binding pocket is more rigid and relatively smaller than the CDK9 binding pocket. The binding modes of these three compounds demonstrate the molecular basis for their potent selectivity toward CDK9 rather than CDK2. Besides, compared to CAN508, these three compounds all made additional hydrophobic interaction with the key residue Phe30 of CDK9 and formed unfavorable and limited interaction with the activation loop of CDK2, suggesting that they all exhibit better selectivity toward CDK9 than CDK2 (Table 2 and Figure 5).

Molecular Dynamics (MD) Simulations
Molecular dynamics (MD) simulations are able to investigate the physical motions of the protein-ligand complex in particular environments to validate the stability of the docking results and to calculate the   [32]. To determine and compare the binding stabilities of Roniciclib and the three identified compounds, their docking complexes were subjected to 50 ns MD simulations in an explicit hydration environment. Their MD simulation trajectories were explored by their backbone root-mean-square deviation (RMSD) values. As shown in Figure 7, the average RMSD values of Roniciclib, NCI207113, TCM0004 and TCM3282 were all within 3 Å, revealing that they all retained excellent equilibrium during the entire MD simulation courses. Generally, the protein-compound complex showing convergence RMSD trajectories with average RMSD value within 3 Å in the entire MD simulation is considered as a stable system [45]. According to the above results, NCI207113, TCM0004 and TCM3282 all exhibited excellent binding stability with CDK9, suggesting that their trajectories could be used for further binding free energy calculations.

Binding free Energy Calculations
Binding free energy calculated by the mechanics/Poisson-Boltzmann surface area (MM/PBSA) method has been proven to have a high correlation with experimental activity [46]. In this study, the binding free energies of Roniciclib and the three identified compounds were calculated using the MM/PBSA method from the last 10 ns of their MD trajectories. As shown in Table 3, NCI207113 (−49.98 kcal/mol), TCM0004 (−51.02 kcal/mol) and TCM3282 (−62.18 kcal/mol) all showed lower binding free energies than Roniciclib (−43.00 kcal/mol), indicating that they all exhibit higher binding affinities than Roniciclib. To determine which energy term contributes to a greater extend on the binding affinity, their binding free energies were decomposed into four individual energy components (ΔG vdw , ΔG ele , ΔG polar and ΔG nonpolar ). As shown in Table 3, the most favorable energy contributors within CDK9-compound complexes were van der Waals interactions (ΔG vdw ) and nonpolar solvation energies (ΔG nonpolar ). The formation of the strong favorable van der Waals interactions and nonpolar solvation energies with CDK9 towards inhibitor binding process is well correlated with the fact that the key residues of the glycine-rich loop of CDK9 are all hydrophobic [47]. It is worthy of mentioning that these three compounds also display very high unfavorable polar solvation energy terms (ΔG polar ), indicating that chemical modifications to reduce the numbers of functional polar group of the CDK9 inhibitors allow them to fit perfectly in the CDK9 binding pocket with highly hydrophobic environment. Thus, increasing the favorable van der Waals interactions and nonpolar solvation energy and/or reducing the unfavorable polar solvation energy terms through chemical modifications would be a promising strategy for designing novel CDK9 inhibitors with better binding affinity.

ADMET Prediction
Considering that most of the previously reported CDK9 inhibitors have failed in their clinical trials due to several serious side effects, the assessment of the ADMET properties in the early stage of the drug development is necessary to avoid failure. In this study, the ADMET properties of the three identified compounds (NCI207113, TCM0004 and TCM3282) were predicted using admetSAR server and the results are presented in Table 4. For absorption, "Human intestinal absorption" and "Human oral bioavailability" data displayed that only NCI207113 has the potential to become an oral drug. For Distribution, these three identified compounds all have strong plasma protein binding (PPB) property, suggesting that they are difficult to be displaced by other drugs from plasma proteins into blood. Besides, these three identified The value of Hepatotoxicity is 0 to 1.000 with higher value indicating less toxicity.
compounds can penetrate the blood-brain barrier (BBB), which is necessary for treating cancer in brain metastases. For metabolism, there is considerable variation in the efficiency and amount of CYP2D6 enzyme produced between individuals [48]. As shown in Table 4, these three identified compounds are not regulated by CYP2D6, indicating they may have consistent metabolic rates in the human body. For toxicity, NCI207113, TCM0004 and TCM3282 are non-carcinogenic and display low hepatotoxicity, suggesting that they are safety for humans. Overall, NCI207113, TCM0004 and TCM3282 all showed good ADMET properties, indicating their great potential to be served as anticancer drug candidates.

CONCLUSION
In order to discover novel selective CDK9 inhibitors with better inhibitory activity and higher selectivity, a rigorous two-stage screening strategy was developed in this study. The advantage of our two-stage screening strategy is that the first screening stage was designed for identifying the potential CDK9 inhibitors with better inhibitory activity and the second screening stage was aimed to identify the compounds with higher selectivity. In the first stage, a structure-based virtual screening method can successfully exclude a large number of inactive molecules from NCI and TCM databases to simplify the screening process in the second stage, where visual inspection was used to investigate the interaction and binding mode of the compound while bound with CDK9 and CDK2 in detail, which provides extraordinary significance to distinguish the selectivity for each potential compound. Using the well-designed two stage screening strategy, three compounds, NCI207113, TCM0004 and TCM3282, which exhibit better inhibitory activity than Roniciclib and higher selectivity than CAN508, were successfully identified as novel potential CDK9 inhibitors. Although our two-stage screening strategy consists of several steps and may increase the com-putation time comparing to the previous drug screening approaches, it does offer a promising way to effectively and accurately identify inhibitors with excellent inhibitory activity and higher selectivity. In addition, the results from our binding free energy calculations further suggest that enhancing van der Waals interaction and nonpolar solvation energy and/or reducing polar solvation energy can significantly improve the binding affinity of CDK9 inhibitors, indicating that a prospective strategy for enhancing the inhibitory activity of the CDK9 inhibitors through chemical modification and synthesis can be achieved. In summary, NCI207113, TCM0004 and TCM3282 all exhibit great potential to serve as novel CDK9 inhibitors with excellent inhibitory activity and higher selectivity and their clinical potentials to be used as potential anticancer drug candidates can be further evaluated through series of in vitro/in vivo bioassays in the future.