Nematocidal Activities of Protonated Benzimidazolyl Chalcone Using Quantitative Structure-Activity Relationship ()
1. Introduction
Figure 1. Molecular structures of training sets and test of Benzimidazolyl-Chalcones used for QSAR models.
Nematodes are a large group of worms found in all living environments. They live as parasites of plants, animals and humans. They are the cause of most parasitic diseases in humans (elephantiasis, filariasis) [1] [2], gastrointestinal infections and loss of productivity in animals [3] [4]. However, for more than two decades, the continued emergence of new, resistant races of nematodes has been of increasing concern to the agricultural, medical and health communities. There is a need to develop an anthelmintic that offers a broad spectrum of action, a high degree of efficacy, a good safety margin and flexibility of use in order to contain the problems of resistance. Several hundred compounds derived from benzimidazoles have been synthesised, of which a few have been selected primarily for their broad-spectrum anthelmintic activity. Benzimidazolyl-chalcone derivatives are of considerable pharmacological interest because of their therapeutic properties in many diseases. Several studies have shown that benzimidazolyl derivatives possess antihistaminic properties [5], antifungals [6], anti-allergic [7], antibacterial [8]-[10] and antiviral [11]. As these therapeutic properties are linked to the conformation of the molecules and the interactions, they can establish with each other, a QSAR study is used to determine models. These models are increasingly used, due to the growth in computing resources, to explain and/or predict molecular properties in order to limit the excessive number of experiments, which are sometimes long and expensive and to reduce the cost of drug production by pharmaceutical companies [12] [13]. In the specific case of the QSAR study, ten derivatives of Benzimidazolyl-Chalcones were studied and four others from the same series were used for the external validation test (Figure 1). These molecules were synthesised by Ouattara et al. [14]. The nematocidity of the 14 BZC derivatives was modelled using several statistical tools, in particular, principal component analysis (PCA), multiple linear regression (MLR) and non-linear regression (MNLR). The general objective of this work is to make a descriptive and predictive study of the nematocidal activity of BZCs based on multivariate statistical analyses.
2. Materials and Methods
2.1. Level of Calculation
In order to establish a descriptive and predictive theory of the biological activities of protonated BZCs, Theoretical Chemistry methods are employed at the MPW1PW91/6-311+G (d, p) level. The modified Perdew-Wang 1 (MPW1) calculations [15] [16] such as mPW1PW91 are hybrid HF-DF models that provide good results for both covalent and non-covalent interactions [17]. In this work, to assess the quantitative structure-activity relationship between nematocidal activity LC100 (μg/mL) and protonated BZC descriptors, the quantum chemistry software Gaussian 03 [18] was used. The split-valence and triple-dzeta bases, being sufficiently large and taking into account the diffuse and polarisation functions, are important when dealing with intermolecular interactions. Taking into account these diffuse and polarisation functions gives a quantitative character to the results obtained. The modelling was done using the multilinear regression method implemented in Excel spreadsheets [19] and XLSTAT version 2014 [20].
2.2. Quantum Descriptors Used
The descriptors linked to the molecules in interaction by protonation were calculated. In particular, on the sp2 nitrogen, the preferential protonation site in BZCs. For the development of QSAR models, some theoretical descriptors related to the conceptual DFT have been determined such as the energy of the Highest Occupied Molecular Orbital (LUMO), the energy of the Highest Occupied Molecular Orbital (HOMO), the energy gap (ΔE), the dipole moment (µ) and the average valence angle. It should be noted that, the descriptors related to the boundary molecular orbitals have been calculated in a very simple way within the Koopmans approximation [21]. The LUMO energy characterises the sensitivity of the molecule to nucleophilic attack, and the HOMO energy characterises the susceptibility of a molecule to electrophilic attack. The energy difference ΔE between the LUMO and HOMO, is an important parameter that gauges the overall reactivity towards an electron acceptor and the stability of a molecule. The dipole moment (μ) indicates the stability of a molecule in water. Thus, a high dipole moment will reflect low solubility in organic solvents and high solubility in water. The energy gap is calculated from Equation (1):
(1)
The measured valence angle θNsp2 is the average of the angles τ1 and τ2 about the nitrogen sp2 (Figure 2).
Figure 2. The average valence angle (θNsp) measured on the sp2 nitrogen preferential site of protonation.
The descriptors obtained in protonation interaction were characterised by Kone et al. [22].
2.3. Estimating the Predictive Capacity of a QSAR Model
The BZCs have various nematocidal concentrations ranging from 0.002 to 424.5 µg/mL. This range of concentrations allows a quantitative relationship to be defined between the nematocidal activity and the theoretical descriptors of these molecules. Biological data are generally expressed as the opposite of the base ten logarithm of the activity (
) to obtain higher mathematical values where structures are biologically highly effective [23] [24]. The nematocidal activity is expressed by the nematocidal potential pLC100. The nematocidal potential is defined from Equation (2):
(2)
Where M is the molecular weight (g/mol) and LC100 is the lethal concentration 100 in worms, it gives the concentration of substance required to destroy 100% of a worm population under the conditions of the experiment (µg/mL).
The quality of a model is determined on the basis of different statistical analysis criteria including the coefficient of determination R2, standard deviation S, cross-validation correlation coefficients
and Fischer F. R2, S and F relate to the fit of calculated and experimental values. They describe the predictive ability within the limits of the model, and allow to estimate the accuracy of the calculated values on the test set [25] [26]. As for the cross-validation coefficient
, it provides information on the predictive power of the model. This predictive power is said to be “internal” because it is calculated from the structures used to build the model. The correlation coefficient R2 gives an evaluation of the dispersion of the theoretical values around the experimental values. The quality of the modelling is better when the points are close to the adjustment line [27]. The fit of the points to this line can be assessed by the coefficient of determination.
(3)
Où:
: Experimental value of nematocidal activity
: Theoretical value of nematocidal activity and
: Average value of experimental values for nematocidal activity.
The closer the R2 value is to 1, the more the theoretical and experimental values are correlated.
Furthermore, the variance σ2 is determined by the relation 4:
(4)
Where k is the number of independent variables (descriptors), n is the number of molecules in the test or training set and n – k − 1 is the degree of freedom.
Another statistical indicator used is the standard deviation S. It is used to assess the reliability and accuracy of a test set. It is used to assess the reliability and accuracy of a model:
(5)
The Fisher F test is also used to measure the statistical significance of the model, i.e. the quality of the choice of descriptors making up the model.
(6)
The coefficient of determination of the cross-validation, Q2CV, is used to evaluate the accuracy of the prediction on the test set. It is calculated using the following relationship:
(7)
2.4. Statistical Analysis
2.4.1. Principal Component Analysis (PCA)
The structures of 14 BZC compounds were studied by statistical methods based on Principal Component Analysis (PCA) [28] using the XLSTAT software version 2014 [20]. PCA is a useful statistical technique for summarising all the information encoded in the structures of compounds. It is also very useful for understanding the distribution of compounds [29]. This is an essentially descriptive statistical method which aims to present, in graphic form, the maximum information contained in the data.
2.4.2. Multiple Linear and Non-Linear Regressions (MLR and MNLR)
The statistical technique of Multiple Linear Regression (MLR) is used to study the relationship between a dependent variable (biological activity) and several independent variables (descriptors). This statistical method minimises the differences between the actual and predicted values. It was also used to select the descriptors used as input parameters in the multiple non-linear regression (MNLR). Multiple Non-Linear Regression (MNLR) analysis is a technique for improving the structure-activity relationship in order to quantitatively assess biological activity. It takes into account several parameters. It is the most common tool for studying multidimensional data. It is based on the following pre-programmed XLSTAT functions:
(8)
Where
: represent the parameters and
: represent the variables.
The (MLR) and (MNLR) were generated using the XLSTAT software version 2014 [20], to predict LC100 nematocidal activity. The equations of the different models were evaluated by coefficient of determination (R2), root mean square error (S), Fischer test (F) and cross correlation coefficient (
) [30].
All descriptor values for the ten (10) BZC molecules in the test set and the other four (4) molecules in the validation set are presented in Table 1.
Table 1. Quantum descriptors and experimental potentials of the test and validation sets.
Compounds |
ΔΕ (eV) |
µ (D) |
θNsp (˚) |
pLC100exp |
Training Set |
BZC-1 |
−3.150 |
4.293 |
125.13 |
3.095 |
BZC-2 |
−2.645 |
4.695 |
125.13 |
2.837 |
BZC-3 |
−3.255 |
9.512 |
125.13 |
4.372 |
BZC-4 |
−3.422 |
7.943 |
125.13 |
4.346 |
BZC-5 |
−3.712 |
15.726 |
125.14 |
7.503 |
BZC-6 |
−3.250 |
4.781 |
125.14 |
4.919 |
BZC-7 |
−3.642 |
8.259 |
124.84 |
8.096 |
BZC-8 |
−3.089 |
14.292 |
125.14 |
6.311 |
BZC-9 |
−2.811 |
7.724 |
125.14 |
2.817 |
BZC-10 |
−3.377 |
5.991 |
125.14 |
4.34 |
Test set |
BZC-11 |
−3.498 |
5.273 |
125.13 |
8.094 |
BZC-12 |
−3.314 |
5.812 |
125.13 |
6.215 |
BZC-13 |
−3.054 |
5.028 |
125.13 |
5.612 |
BZC-14 |
−3.133 |
12.953 |
125.14 |
2.887 |
2.5. Acceptance Criteria for a QSAR Model
The performance of a mathematical model, for Eriksson et al. [31], is characterized by a value of
for a satisfactory model when for the excellent model
. According to them, given a test set, a model will perform well if the acceptance criterion
is respected.
According to Tropsha et al. [32] [33], For the external validation set, the predictive power of a model can be obtained from five criteria. These criteria are:
1)
, 2)
, 3)
4)
. et
, 5)
et
.
In addition, Roy and Roy [34], have further refined the predictive capacity of a QSAR model. They have developed quantities
et
, called metric values,
determines how close the observed activity is to the prediction. The metric values
et
are calculated from the oerved and predicted activities. Currently, these two different variants
et
, can be calculated for t test set (internal validation) or for the test set (external validation). A QSAR model is acceptable to these authors, if both criteria are met.
Where
and
.
2.6. Domain of Applicability
The domain of applicability of a QSAR model is the physico-chemical, structural or biological space, in which the model equation is applicable to make predictions for new compounds [35]. It corresponds to the region of the chemical space including the compounds of the training set and similar compounds, which are close in the same space [36]. Indeed, the model, which is built on the basis of a limited number of compounds, by relevant descriptors, chosen among many oers, cannot ba universal tool to predict the activity of any other molecule with confidence. It appears necessary, even mandatory, to determine the DA of any QSAR model. This is recommended by the Organisation for Economic Co-operation and Development (OECD) in the development of a QSAR model [37]. There are several methods for determining the domain of applicability of a model [36]. Among these, the approach used in this work is the leverage approach. This method is based on the variation of the standardised residuals of the dependent variable with the distance between the values of the descriptors and their mean, called leverage [38]. The hii are the diagonal elements of a matrix H called hat matrix. H is the projection matrix of the experimental values of the explained variable Yexp into the space of the predicted values of the explained variable Ypred such that:
(9)
H is defined by the expression (10):
. (10)
The area of applicability is delimited by a threshold value of the lever noted h In general, it is set at
, where n is the number of compounds in the training set, and p is the number of descriptors in the model [39] [40]. For standardised residuals, the two limit values generally used are ±3σ, σ being the standard deviation of the experimental values of the quantity to be explained [41]: this is the “three sigma rule” [42].
3. Results and Discussion
3.1. Principal Component Analysis (PCA)
Three descriptors for the 14 compounds are subjected to PCA analysis. The two main axes are sufficient to describe the information provided by the data matrix. Indeed, the percentages of variance are 53.73% and 25.49% for the F1 and F2 axes, respectively. The total information is estimated at 79.23%. Principal component analysis (PCA) [29] was conducted to identify the relationship between the different descriptors. The bold values are different from 0 at a significance level of p = 0.05. The correlations between the three descriptors are presented in Table 2 as a correlation matrix and in Figure 2 where these descriptors are represented in a correlation circle. The Pearson correlation coefficients are summarised in Table 2. The resulting matrix provides information on the negative or positive correlation between the variables.
Table 2. Correlation matrix (Pearson(n)) between the different descriptors.
Variables |
ΔΕ (eV) |
µ (D) |
θNsp (˚) |
pLC100exp |
ΔΕ (eV) |
1 |
−0.2821 |
0.3893 |
−0.7386 |
µ (D) |
−0.2821 |
1 |
0.0165 |
0.2172 |
θNsp (˚) |
0.3893 |
0.0165 |
1 |
−0.4514 |
pLC100exp |
−0.7386 |
0.2172 |
−0.4514 |
1 |
Bold values are different from 0 at a significant level for p < 0.05. Very significant at p < 0.01. Very significant at p < 0.001.
The Pearson correlation coefficients are summarised in the following Table 2. The matrix obtained provides information on the negative or positive correlation between the variables. The energy gap (ΔE) is negatively correlated with the nematocidal activity pLC100 (r = −0.7386 and p < 0.05) at a significant level.
Figure 3. Circle of correlation.
The correlation circle was performed to detect the connection between the different descriptors. Principal component analysis from the correlation circle (Figure 3) revealed that the F1 axis (53.73% of the variance) appears to represent the mean valence angle (θNsp) and the energy gap (ΔE), and the F2 axis (25.49% of the variance) appears to represent the dipole moment (µ).
Figure 4. Cartesian diagram according to F1 and F2: correlation between the descriptors used and the BZCs.
The Cartesian diagram in Figure 4 linking the 14 BZCs to the three descriptors studied, shows that there is a connection between the compounds BZC-5 and BZC-8 and the dipole moment (µ).
Figure 5. Cartesian diagram according to F1 and F2: Separation between group 1 (pLC100 < 6.00) and group 2 (pLC100 > 6.00).
Figure 5 shows a distribution of the BZCs into two groups: group 1 containing compounds with pLC100 < 6.00 and group 2 containing compounds with pLC100 > 6.00.
3.2. Multiple Linear Regression (MLR)
The equation of the QSAR model with statistical data is presented below. Figure 6 shows the correlation between the experimental and theoretical nematocidal potentials of the test set (blue dots) and the validation set (red dots). The resulting model relates the nematocidal activity to the theoretical descriptors of the protonated molecules. The negative or positive sign of the coefficient of a descriptor in the model reflects the proportionality effect between the evolution of biological activity and this parameter in the regression equation. The negative sign indicates that when the value of the descriptor is high, biological activity decreases. The positive sign indicates the opposite effect. The resulting equation is presented below:
(11)
,
,
,
,
The negative signs of the coefficients of energy gap (ΔE) and mean valence angle (θNsp) reflect that nematocidal activity will be enhanced for low values of these descriptors. And the positive sign of the dipole moment (µ) also reflects that the nematocidal activity will be improved for high values of the dipole moment. The significance of the model is reflected by the Fischer coefficient F = 69.21: the correlation coefficient of the cross-validation
. This model is acceptable with
. The regression line between the experimental and theoretical nematocidal potentials of the test set and the validation set is shown in Figure 6.
Figure 6. The regression line of the MLR model.
Verification of Tropsha’s Criteria
,
,
and
;
and
.
As mt the Tropsha criteria, so the model is acceptable for predicting nematocidal activity.
3.3. Multiple Non-Linear Regression (MNLR)
Figure 7. The regression line of the MNLR model.
The non-linear regression statistical method was used to improve the predicted L00 nematocidal activity in a quantitative way. It takes into account the three selected descriptors (ΔE, µ, θNsp). It is the most common tool for the study of multidimensional data. This statistical method is applied to the data in Table 1 containing 14 molecules associated with the three descriptors. The resulting equation is:
(12)
,
,
,
,
The significance of the model is expressed by the Fischer coefficient F = 17228: the correlation coefficient of the cross-validation
. This model is acceptable with
. The regression line between the experimental and theoretical nematocidal potentials of the test set (blue points) and the test set (red points) is shown in Figure 7.
Checking Roy’s Criteria
All values meet Tropsha’s criteria, so the model is acceptable for predicting nematocidal activity.
Of the two models, the model obtained by the statistical method MNLR has a significantly better predictive ability than the MLR approach.
The low standard deviation values are 0.64 and 0.422 in models 1 and 2, respectively, showing that the predicted and experimental values are very similar (Figure 8). These curves show similar evolution of these values in both models of BZC series derived from benzimidazoles, despite some recorded differences.
Figure 8. Similarity curve of experimental and predicted values for models 1 and 2.
However, as this model is a function of three theoretical descriptors, it is essential to determine the contribution of each one in the prediction of the nematocidal activity for this series of molecules. Indeed, the knowledge of this contribution makes it possible to establish the order of priority of the various descriptors and to define the choice of the parameters to be optimised for the achievement of a better activity.
3.4. Analysis of the Contribution of the Descriptors
The contribution of the three descriptors of this model in predicting the nematocidal activity of Benzimidazolyl-Chalcones was determined. The various contributions are illustrated in Figure 9.
Figure 9. Contribution of descriptors in the models.
The dipole moment (µ) shows a large proportion followed by the mean valence angle (θNsp) and finally the energy gap (ΔE). Thus it is noted that to improve nematocidal activity, a high value of dipole moment (µ) is required. This claim is evidenced by Figure 4 where compounds BZC-5 and BZC-8 which have high values of dipole moment (µ) correlate well with pLC100 activity.
3.5. Domain of Applicability Analysis
The range of applicability of the MLR and MNLR models was determined by the lever method. The values of the hii levers and the standardised residues of the molecules were used to plot the standardised residues against the hii levers in Figure 10.
Figure 10. Graph of Standardised Residues according to the levers of the MLR and MNLR models.
For the 10 molecules of the training set and the 3 descriptors of the model, the threshold value of the levers h* is 1.2. The extreme values of the standardised residuals are ±3 according to the “three sigma rule” [42]. These different values delimit the area of applicability [43] of the model as shown on the graph in Figure 10. This figure shows that all molecules have levers below the threshold lever (h* = 1.2) and standardised residue values between +3 and −3. This result means that all molecules belong to the applicability domain.
4. Conclusion
This study has made it possible to highlight relationships between the nematocidal activity LC100 (μg/mL), which is an interaction quantity by its size, and the descriptors of the interacting molecules by protonation. From a chemical point of view, these descriptors obtained by molecule interaction will be able to guide the experimenter in the synthesis of new more active molecules. The descriptors of the protonated molecules (ΔE, µ, θNsp) can explain and predict the nematocidal activity of BZCs because there are strong correlations between the calculated and experimental values of nematocidal activity. Statistical methods such as principal component analysis (PCA), multilinear and non-linear regression were used. The robustness study of the two models (MLR and MNLR) constructed shows good stability and excellent predictive power. Moreover, compared to the MLR model, the MNLR model (R2 = 0.955, S = 0.422, F = 170.228) is better and constitutes an efficient tool to predict the nematocidal activity of the best studied BZC analogues called “leads”. Furthermore, the study of the contribution of the descriptors showed that the dipole moment (µ) is the priority descriptor in predicting the nematocidal activity of the studied protonated BZCs. Furthermore, the positive sign of the dipole moment coefficient (µ) in the MLR model equation reflects that high values of dipole moment (µ) could enhance the nematocidal activity of BZCs.
Acknowledgements
Our thanks go to the entire team of the Institute of Analytical Sciences and Physico-Chemistry for the Environment and Materials (IPREM), the University of Pau and the Pays de l’Adour. Particularly Dr. Panaghiotis Karamanis, CNRS-IPREM-UPPA/ECP Technopole Helioparc Research Officer who facilitated the collaboration with the French computing center.
Disclaimer (Artificial Intelligence)
Author(s) hereby declare that NO generative AI technologies such as Large Language Models (ChatGPT, COPILOT, etc.) and text-to-image generators have been used during writing or editing of this manuscript.