^{1}

^{2}

^{1}

^{*}

^{1}

^{1}

^{1}

In this work, we conducted a QSAR study on 18 molecules using descriptors from the Density Functional Theory (DFT) in order to predict the inhibitory activity of hydroxamic acids on histone deacetylase 7. This study is performed using the principal component analysis (PCA) method, the Ascendant Hierarchical Classification (AHC), the linear multiple regression method (LMR) and the nonlinear multiple regression (NLMR). DFT calculations were performed to obtain information on the structure and information on the properties on a series of hydroxamic acids compounds studied. Multivariate statistical analysis yielded two quantitative models (model MLR and model MNLR) with the quantum descriptors: electronic affinity (AE), vibration frequency of the OH bond (
*ν*(OH)) and that of the NH bond (
*ν*(NH)). The LMR model gives statistically significant results and shows a good predictability
R
^{2}
= 0.9659, S = 0.488, F = 85 and
p-value < 0.0001. Electronic affinity is the priority descriptor in predicting the activity of HDAC7 inhibitors in this study. The results obtained suggest that the descriptors derived from the DFT could be useful to predict the activity of histone deacetylase 7 inhibitors. These models were evaluated according to the criteria of Tropsha
*et al*.

Histone deacetylases (HDAC) have become essential transcriptional corepressors in a variety of physiological systems. To date, 18 human HDACs have been identified and grouped into four classes. HDAC Class I (HDAC1, 2, 3 and 8), HDAC Class II (HDAC4, 5, 6, 7, 9 and 10), HDAC Class III, also called sirtuins (SIRT1, 2, 3, 4, 5, 6 and 7) and HDAC Class IV (HDAC11). Class II HDACs are subdivided into classes IIa (HDAC4, 5, 7, 9) and class IIb (HDAC6 and 10) [

In order to establish a descriptive and predictive theory of the anticancer activity HDAC7 of hydroxamic acids, the methods of Theoretical Chemistry are used at the B3LYP/6-311G (d, p) level. The gradient-corrected functionalities and the hybrid functionals such as B3LYP give better energies and agree with high-level ab initio methods [

Some physico-chemical descriptors have been used for the development of QSAR models. In particular, the electronic affinity (AE) and the vibration descriptors that are the vibration frequency ν (O-H) and the vibration frequency ν (N-H). These two vibration descriptors are shown in

It should be noted that the descriptors related to the molecular frontier orbitals have been calculated as part of the Koopmans approximation [

AE = − E LUMO (1)

Training set | |||
---|---|---|---|

Code | Code | ||

1 | 11 | ||

2 | 12 | ||

5 | 14 | ||

7 | 15 | ||

9 | 17 | ||

10 | 18 | ||

19 | |||

Validation set | |||

Code | Code | ||

3 | 8 | ||

4 | 13 | ||

6 | 16 |

Several studies have shown that geometric descriptors provide better models as well as global responsivity descriptors [

The structures of 18 hydroxamic acid compounds were studied by statistical methods based on Principal Component Analysis (PCA) [

The Multiple Linear Regression (MLR) statistical technique is used to study the relationship between a dependent variable (Biological activity) and several independent variables (descriptors). This statistical method minimizes the differences between the actual and predicted values.

It also allowed to select the descriptors used as input parameters in nonlinear multiple regression (NMR).

Nonlinear multiple regression (NMR) analysis is a technique that improves the structure-activity relationship to quantitatively evaluate biological activity. It considers several parameters. It is the most common tool for studying multidimensional data. It is based on preprogrammed XLSTAT functions as follows:

y = a + ( b x 1 + c x 2 + d x 3 + e x 4 ) + ( f x 12 + g x 22 + h x 32 + i x 42 ) (2)

where a , b , c , d , ⋯ : are the parameters and x 1 , x 2 , x 3 , x 4 , ⋯ : are the variables.

The (MLR) and the (NRM) were generated using the XLSTAT software version 2014 [^{2}), the mean squared error (S), the Fischer test (F) and the cross-correlation coefficient ( Q C V 2 ) [

The potential of the inhibitory concentration is calculated according to the following expression:

pIC 50 = − log ( IC 50 ∗ 10 − 6 ) (3)

Histone deacetylase 7 has various inhibitory concentrations ranging from 0.311 to 38.9 μM. This range of concentrations makes it possible to define a quantitative relationship between the cancer activity and the theoretical descriptors of these molecules. The quality of a model is determined based on various statistical analysis criteria including the coefficient of determination R^{2}, the standard deviation S, the correlation coefficients of the cross validation Q C V 2 and Fischer F. R^{2}, S and F relate to the adjustment of calculated and experimental values. They describe the predictive ability within the limits of the model and make it possible to estimate the accuracy of the values calculated on the test set [^{2} gives an evaluation of the dispersion of the theoretical values around the experimental values. The quality of the modeling is better when the points are close to the adjustment line [

R 2 = 1 − ∑ ( y i , e x p − y i , t h e o ) 2 ∑ ( y i , e x p − y ¯ i , e x p ) 2 (4)

where:

y i , e x p : Experimental value of anticancer activity,

y i , t h e o : Theoretical value of anticancer activity and,

y ¯ i , e x p : Mean value of the experimental values of the anticancer activity.

The more the value of R^{2} will be close to 1, the more the theoretical and experimental values are correlated.

Moreover, the variance σ 2 is determined by the relation 4:

σ 2 = s 2 = ∑ ( y i , e x p − y i , t h e o ) 2 n − k − 1 (5)

where k is the number of independent variables (descriptors), n is the number of molecules in the test or learning set, and n − k − 1 is the degree of freedom.

The standard deviation or standard deviation S is another statistical indicator used. It allows to evaluate the reliability and the precision of a model:

s = ∑ ( y i , e x p − y i , t h e o ) 2 n − k − 1 (6)

The Fisher F test is also used to measure the level of statistical significance of the model, that is, the quality of the choice of descriptors constituting the model.

F = ∑ ( y i , t h e o − y i , e x p ) 2 ∑ ( y i , e x p − y i , t h e o ) 2 ∗ n − k − 1 k (7)

The coefficient of determination of the cross-validation Q C V 2 makes it possible to evaluate the accuracy of the prediction on the test set. It is calculated using the following relation:

Q c v 2 = ∑ ( y i , t h e o − y ¯ i , e x p ) 2 − ∑ ( y i , t h e o − y i , e x p ) 2 ∑ ( y i , t h e o − y ¯ i , e x p ) 2 (8)

According to Eriksson et al. [

According to Tropsha et al. [

1) R T e s t 2 > 0.7 , 2) Q C v T e s t 2 > 0.6 , 3) | R T e s t 2 − R 0 2 | ≤ 0.3 ,

4) | R T e s t 2 − R 0 2 | R T e s t 2 < 0.1 and 0.85 ≤ k ≤ 1.15 , 5) | R T e s t 2 − R ′ 0 2 | R T e s t 2 < 0.1 and 0.85 ≤ k ′ ≤ 1.15

The applicability domain principle helps modelers to specify the scope of proposed models, thereby defining the model’s limitations with respect to its structural domain and chemical space. If an external compound exceeds the defined scope of a model, it is outside the applicability domain of that model and cannot be associated with reliable prediction. There are several methods for determining the applicability domain of a QSAR model, among which we find the lever method that is used the most. If a compound has a residual and a lever that exceeds the threshold h* = 3p/n (where p is the number of descriptors plus 1 and n the number of observations), this compound is considered outside the field of applicability of the elaborate model. The field of applicability will be discussed using the Williams diagram which represents the standardized prediction residuals as a function of the values of the hi levers [_{i}), the value of h_{i} is calculated by the following relation [

h i = X i ( X T X ) − 1 X i T (9)

where i = ( 1 , ⋯ , n )

With: X_{i} is the line vector of the descriptors of the compound i, X (n * k − 1) is the matrix of the model deduced from the values of the descriptors of the training set; the index T designates the transposed matrix of the matrix. The critical value of the lever (h^{*}) is set [

h * = 3 ( k + 1 ) n

With n, the number of test compounds used; k is the number of the descriptors of the model.

If h_{i} < h^{*}, the probability of agreement between the measured and predicted values of the compound “i” is as high as that of the compounds in the database. Compounds with h_{i} > h^{*} reinforce the model when they belong to the training set but will otherwise have dubious predicted values without being necessarily aberrant, the residues being low [

The set of descriptor values of the thirteen (13) hydroxamic acid molecules of the test set and the six (6) other molecules of the validation set are presented in

All three descriptors for the 19 hydroxamic acid compounds are subjected to the PCA analysis. The two main axes are enough to describe the information provided by the data matrix. The correlations between the three descriptors are presented in

Compounds | AE | ν(O-H) | ν(N-H) | pIC_{50} | |
---|---|---|---|---|---|

Training set | |||||

1 | 1.305 | 3562.210 | 3631.280 | 5.535 | |

2 | 0.511 | 3560.910 | 3630.910 | 6.069 | |

3 | 0.768 | 3561.380 | 3630.410 | 6.202 | |

4 | 0.630 | 3560.950 | 3629.760 | 6.018 | |

5 | 0.502 | 3561.920 | 3631.460 | 6.507 | |

6 | 1.123 | 3562.720 | 3631.190 | 6.114 | |

7 | 0.470 | 3561.710 | 3631.710 | 6.444 | |

8 | 1.022 | 3562.170 | 3632.030 | 5.836 | |

9 | 0.585 | 3562.160 | 3632.450 | 6.276 | |

10 | 0.848 | 3559.450 | 3619.450 | 6.312 | |

11 | 1.138 | 3560.340 | 3630.790 | 5.348 | |

12 | 1.122 | 3560.730 | 3632.510 | 5.098 | |

13 | 1.157 | 3560.480 | 3632.420 | 5.115 | |

Validation set | |||||

14 | 0.586 | 3561.610 | 3630.480 | 6.450 | |

15 | 1.082 | 3557.640 | 3631.620 | 4.410 | |

16 | 1.074 | 3560.81 | 3632.76 | 5.100 | |

17 | 1.409 | 3561.980 | 3631.790 | 5.252 | |

18 | 1.397 | 3560.790 | 3631.690 | 4.897 | |

19 | 1.424 | 3561.520 | 3631.710 | 5.357 | |

AE in electron Volt (eV), ν(O-H) in Cm^{−1} and ν(N-H) in Cm^{−1}, IC_{50} (μM).

Variables | AE | ν(O-H) | ν(N-H) | pIC_{50} |
---|---|---|---|---|

AE | 1 | |||

ν(O-H) | 0.0917 | 1 | ||

ν(N-H) | −0.1845 | −0.6225 | 1 | |

pIC_{50} | −0.9264 | −0.1242 | −0.0209 | 1 |

Bold values are different from 0 to a significant level for p < 0.05. Very significant for p < 0.01. Very significant for p < 0.001.

The two main axes are enough to characterize the different descriptors. In fact, the variance percentages are 50.51% and 33.49% for the F1 and F2 axes, respectively. The total information is estimated at 85%. Principal Component Analysis (PCA) [

The matrix obtained provides information on the negative or positive correlation between the variables. The Pearson correlation coefficients are summarized in

The correlation circle was made to detect the connection between the different descriptors. The analysis of the principal components from the correlation circle (

The AHC of _{1} (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 14, 17) and C_{2} (11, 12, 13, 15, 16, 18, 19).

The equations of the QSAR models obtained for cancer activity from MLR and NMR as well as the statistical indicators are given in

The equations of the different models are obtained by using three descriptors (AE, ν(O-H) and ν(N-H)) determined from the optimized molecules. It is important to note that the negative or positive sign of a model descriptor coefficient reflects the proportionality effect between the evolution of the biological activity and this parameter of the MLR model equation. Thus, the negative sign indicates that when the value of the descriptor is high, the biological activity decreases while the positive sign reflects the opposite effect. For the MLR model, the negative signs of the coefficients of the three descriptors (AE, ν(O-H) and ν(N-H)) indicate that the HDAC7 activity will be improved for low values of these descriptors. The study of the significance of these different models is led by the evaluation of the statistical indicators and by the acceptance criteria of Erickson et al. and Tropsha et al. The values of the statistical indicators determined for each model are reported in

In these respective models (MLR and NMR), 96.56% and 97.62% of the descriptors (AE, ν(O-H) and ν(N-H)) are considered with the standard deviation S of prediction from 0.488 to 0.663. The significance of these models is given by the Fischer F test, 85.00 to 172.79 respectively for the MLR and NMR. The correlation coefficient of the cross-validation Q C V 2 is 0.918 to 0.938 respectively for the RMNL and RML models. These values reflect excellent models according to Erikson et al. [

Regression equations | |
---|---|

MRL | pIC 50 pred = − 902.89167 − 1.16813 ∗ AE + 0.35349 ∗ ν ( O-H ) − 0.09615 ∗ ν ( N-H ) |

NMR | pIC 50 pred = 186539 − 1.27488 ∗ AE − 121.60150 ∗ ν ( O-H ) + 16.28351 ∗ ν ( N-H ) + 0.03043 ∗ AE 2 + 0.01712 ∗ ν ( O-H ) 2 − 0.00226 ∗ ν ( N-H ) 2 |

Statistical indicators of multilinear regression | MLR | NMR |
---|---|---|

Number of compounds (n) | 19 | 19 |

Coefficient of determination R^{2} | 0.9659 | 0.9762 |

Standard deviation S | 0.488 | 0.663 |

Fisher’s test F | 85.00 | 172.79 |

Coefficient of correlation of cross validation Q C V 2 | 0.95 | 0.9759 |

R 2 − Q C V 2 | 0.0159 | 0.000 |

Activity field IC 50 exp (µM) | 0.311 à 38.9 | |

Trust level α | 95 % |

The regression line between the experimental and theoretical nematocidal potentials of the test set and the validation set is illustrated in

The low values of the standard error of 0.448 to 0.663 respectively of the NMR and RML models attest to the good similarity between the predicted and experimental values of the HDAC7 activity despite some differences recorded (

Verification of Tropsha Criteria

The statistical indicators of the five (5) Tropsha criteria of these two models (MLR and NMR) of the validation set are given in

R T e s t 2 > 0.7 , Q C v T e s t 2 > 0.6 , | R T e s t 2 − R 0 2 | ≤ 0.3

| R T e s t 2 − R 0 2 | R T e s t 2 < 0.1 and 0.85 ≤ k ≤ 1.15 ; | R T e s t 2 − R ′ 0 2 | R T e s t 2 < 0.1 and 0.85 ≤ k ′ ≤ 1.15

All values meet the Tropsha criteria, so these models are acceptable for predicting HDAC7 anticancer activity.

However, these two models being a function of three theoretical descriptors, it is essential to determine the contribution of each in the prediction of the anticancer activity HDAC7 of the hydroxamic acids studied. Indeed, the knowledge of this contribution makes it possible to establish the order of priority of the various descriptors and to define the choice of the parameters to be optimized for the realization of a better activity HDAC7.

HDAC7 | R T e s t 2 | Q C V T e s t 2 | R T e s t 2 − R 0 2 | | R T e s t 2 − R 0 2 | R T e s t 2 | | R T e s t 2 − R ′ 0 2 | R T e s t 2 | k | k’ |
---|---|---|---|---|---|---|---|

MLR | 0.9809 | 0.9504 | 0.0301 | 0.000 | 0.000 | 1.001 | 0.998 |

NMR | 0.9807 | 0.9511 | 0.000 | 0.000 | 0.001 | 0.9974 | 1.002 |

The contribution of the four descriptors of this model in the prediction of the anticancer activity HDAC7 of the hydroxamic acids was determined from the software XLSTAT version 2014 [

In this study, the electronic affinity (AE) has a nearly identical weight as the vibration frequencies ν(O-H) and ν(N-H). The absence of one of these descriptors in the model could destabilize this one. It should be noted that these quantum descriptors in a global way make a rather important contribution in the prediction of the anticancer activity of HDAC7.

The values of leverage and standardized residuals of the observables of the model used to develop the applicability domain of this model are shown in

Analysis of the data in

In this work, the anticancer activity HDAC7 of nineteen (19) hydroxamic acid compounds was correlated with the theoretical descriptors calculated by the DFT methods. The descriptors electronic affinity (AE), vibration frequencies ν(N-H) and ν(O-H) can explain and predict the anticancer activity HDAC7. Statistical methods such as Principal Component Analysis (PCA), Ascending Hierarchical Classification (AHC), Multilinear and Nonlinear Regression were used.

Compounds | pIC50-exp | Pred (pic50) | Residu | Residu std. | hii |
---|---|---|---|---|---|

1 | 5.535 | 5.633 | −0.098 | −0.942 | 0.38 |

2 | 6.069 | 6.136 | −0.068 | −0.653 | 0.24 |

3 | 6.202 | 6.050 | 0.152 | 1.460 | 0.09 |

4 | 6.018 | 6.123 | −0.105 | −1.008 | 0.14 |

5 | 6.507 | 6.451 | 0.056 | 0.540 | 0.23 |

6 | 6.114 | 6.034 | 0.079 | 0.760 | 0.42 |

7 | 6.444 | 6.390 | 0.053 | 0.514 | 0.24 |

8 | 5.836 | 5.877 | −0.042 | −0.399 | 0.18 |

9 | 6.276 | 6.343 | −0.068 | −0.651 | 0.21 |

10 | 6.312 | 6.329 | −0.017 | −0.159 | 0.92 |

11 | 5.348 | 5.214 | 0.134 | 1.285 | 0.29 |

12 | 5.098 | 5.205 | −0.107 | −1.032 | 0.28 |

13 | 5.115 | 5.085 | 0.030 | 0.285 | 0.36 |

Two QSAR models (MLR, NMR) showed that the descriptors used would predict, at an acceptable level of confidence, the inhibitory activity of hydroxamic acids. The use of two different methods in this work was to show on the one hand that from these descriptors, we can predict in a different way the inhibitory activity of hydroxamic acids on HDAC7 and on the other hand, the relevance has these descriptors. However, the MLR model (R^{2} = 0.9659, S = 0.488, F = 85 and p-value < 0.0001) is an effective tool for predicting HDAC7 anticancer activity. Moreover, the study of the contribution of the descriptors showed that these descriptors are almost equivalent in the prediction of the inhibitory activity of the HDACi7 studied. For this model, the future compounds must have their standardized residue between −1.5 and +1.5 with a threshold lever h^{*} = 0.923. A study of the applicability domain of these models is envisaged. From the field of applicability and quantum descriptors elaborated in this work, we plan to propose new molecules with improved activities.

The authors declare no conflicts of interest regarding the publication of this paper.

Soro, D., Ekou, L., Ouattara, B., Kone, M.G.-R., Ekou, T. and Ziao, N. (2019) DFT Study, Linear and Nonlinear Multiple Regression in the Prediction of HDAC7 Inhibitory Activities on a Series of Hydroxamic Acids. Computational Molecular Bioscience, 9, 63-80. https://doi.org/10.4236/cmb.2019.93006