^{1}

^{2}

^{1}

^{*}

^{1}

^{1}

^{1}

In this work, which consisted to develop a predictive QSPR (Quantitative Structure-Property Relationship) model of the first reduction potential, we were particularly interested in a series of forty molecules. These molecules have constituted our database. Here, thirty molecules were used for the tra ining set and ten molecules were used for the test set. For the calculation of the descriptors, all molecules have been firstly optimized with a frequency calculation at B3LYP/6-31G(d,p) theory level. Using statistical analysis methods, a predictive QSPR (Quantitative Structure-Property Relationship) model of the first reduction potential dependent on electronic affinity (EA) only have been developed. The statistical and validation parameters derived from this model have been determined and found interesting. These different parameters and the realized statistical tests have revealed that this model is suitable for predicting the first reduction potential of future TCNQ (tetracyanoquinodimethane) of this same family belonging to its applicability domain with a 95% confidence level.

Conjugated simple organic molecules carrying both electron donors and acceptors have recently attracted a lot of attention because of their various and interesting properties. Non-linear optical properties [

Intramolecular electron transfer processes are one of the main topics of current interest in physic organic chemistry [

The objective of this work is to develop a predictive QSPR model of the first reduction potential from a series of TCNQ molecules using quantum descriptors, to explain and predict the first reduction potential of the future TCNQ molecules of this same family belonging to its applicability domain.

In the development of the predictive QSPR model of the first reduction potential, we considered a series of forty Tetracyanoquinodimethane derivatives codified TCNQ [

Training set | |||
---|---|---|---|

Code | Molecule | E exp 1 ( V ) | Reference |

TCNQ_1 | +0.170 | [ | |

TCNQ_2 | +0.110 | [ | |

TCNQ_3 | +0.120 | [ | |

TCNQ_4 | +0.012 | [ | |

TCNQ_5 | −0.180 | [ | |

TCNQ_6 | +0.130 | [ | |

TCNQ_7 | +0.130 | [ |

TCNQ_8 | −0.470 | [ | |
---|---|---|---|

TCNQ_9 | -0.090 | [ | |

TCNQ_10 | +0.068 | [ | |

TCNQ_11 | +0.03 | [ | |

TCNQ_12 | −0.05 | [ | |

TCNQ_13 | +0.058 | [ | |

TCNQ_14 | +0.048 | [ | |

TCNQ_15 | +0.320 | [ |

TCNQ_16 | +0.200 | [ | |
---|---|---|---|

TCNQ_17 | −0.360 | [ | |

TCNQ_18 | −0.370 | [ | |

TCNQ_19 | −0.340 | [ | |

TCNQ_20 | +0.290 | [ | |

TCNQ_21 | +0.300 | [ | |

TCNQ_22 | −0.010 | [ |

TCNQ_23 | +0.530 | [ | |
---|---|---|---|

TCNQ_24 | −0.010 | [ | |

TCNQ_25 | +0.080 | [ | |

TCNQ_26 | +0.210 | [ | |

TCNQ_27 | −0.040 | [ | |

TCNQ_28 | −0.570 | [ | |

TCNQ_29 | −0.140 | [ |

TCNQ_30 | −0.026 | [ | |
---|---|---|---|

Test set | |||

TCNQ_31 | +0.260 | [ | |

TCNQ_32 | +0.070 | [ | |

TCNQ_33 | +0.410 | [ | |

TCNQ_34 | +0.650 | [ | |

TCNQ_35 | +0.120 | [ | |

TCNQ_36 | +0.130 | [ | |

TCNQ_37 | −0.440 | [ |

TCNQ_38 | +0.030 | [ | |
---|---|---|---|

TCNQ_39 | +0.260 | [ | |

TCNQ_40 | −0.020 | [ |

The GaussView 5.0 [

To develop a QSPR model, a data analysis method is required. This method quantifies the relationship between the studied property and the molecular structure (descriptors). There are several methods for the implementation of a model and the analysis of its statistical data. But the one we used in our study is Simple Linear Regression (SLR) (a single explanatory variable). Generally speaking, the equation of the simple regression is of the form:

Y = a 0 + a 1 X (1)

with Y standing for the studied property, X represents the explanatory variable in correlation with the studied property and a 0 , a 1 are the model regression constants.

The selection of descriptors is a crucial step in QSPR modeling. In this study, the selection of descriptors was based on two criteria described as follows:

• Criterion 1

There must be a linear dependence relationship between the first reduction potential and the descriptors. Under these conditions we shall have | R | ≥ 0.50 [

• Criterion 2

The descriptors must be independent from one another. To do this, the partial correlation coefficient a i j between the descriptors i and j must be less than 0.70 ( a i j < 0.70 ) [

R = C O V ( X , Y ) S X ⋅ S Y (2)

and

a i j = C O V ( X i , X i ) V a r ( X i ) (3)

The relationships (4), (5), (6) and (7) were used to calculate many statistical and validation parameters:

ESS = ∑ ( Y i , c a l − Y ¯ e x p ) 2 (4)

TSS = ∑ ( Y i , e x p − Y ¯ e x p ) 2 (5)

RSS = ∑ ( Y i , e x p − Y i , c a l ) 2 (6)

TSS = ESS + RSS (7)

where TSS is total sum of squares, ESS stands for extended sum of squares and RSS is residual sum of squares.

• Determination coefficient (R^{2}) [

The determination coefficient is given by the following relationship:

R 2 = 1 − ∑ ( Y i , e x p − Y i , c a l ) 2 ∑ ( Y i , e x p − Y ¯ e x p ) 2 = 1 − RSS TSS (8)

with

R = ∑ ( Y i , c a l − Y ¯ e x p ) 2 ∑ ( Y i , e x p − Y ¯ e x p ) 2 = ESS TSS (9)

• Standard deviation (s) [

It is an indicator of dispersion. It provides information on how the distribution of data is performed around the average. The closer its value is to 0, the better the adjustment and the more reliable will be the prediction.

s = ∑ ( Y i , e x p − Y i , c a l ) 2 n − p − 1 = RSS n − p − 1 (10)

• Adjusted determination coefficient ( R adjusted 2 ) [

It allows to measure the robustness of a model unlike R 2 . This coefficient is used in multiple regressions because it considers the number of descriptors parameters of the model.

R adjusted 2 = 1 − ( n − Intercept ) n − p − 1 ⋅ RSS TSS = 1 − ( n − Intercept ) n − p − 1 ⋅ ( 1 − R 2 ) (11)

• Fisher-Snedecor coefficient (F) [

It allows to test the global significance of linear regression. A globally significant regression equation contains at least a relevant explanatory variable to explain the dependent variable. The Fisher-Snedecor coefficient is related to the determination coefficient by the following relationship:

F = n − p − 1 p ⋅ ESS RSS = n − p − 1 p ⋅ R 2 1 − R 2 (12)

• Kubinyi Criterion (FIT) [

It measures the size or robustness of the model. The smaller the FIT, the more robust the model is, meaning that the model has more variables.

FIT = n − p − 1 ( n + p ) 2 ⋅ R 2 1 − R 2 (13)

• Cross-validation coefficient ( Q LOO 2 ) [

It measures the accuracy of the prediction on the data of the training set

Q LOO 2 = 1 − ∑ ( y i , e x p − y i , p r e d ) 2 ∑ ( y i , e x p − y ¯ e x p ) 2 = 1 − PRESS TSS (14)

• Cross-validation criteria (PRESS) [

As the sum of the quadratic prediction errors, PRESS (Prediction Sum of Squares) is defined by the relationship:

PRESS = ∑ ( y i , e x p − y i , p r e d ) 2 (15)

This criterion is used to select models with good predictive power (we always look for the smallest PRESS). A Standard Deviation of Error of Prediction (SDEP) is calculated from PRESS:

SDEP = ∑ ( y i , e x p − y i , p r e d ) 2 n = PRESS n (16)

In these expressions, n is the number of molecules in the training set, p is the number of explanatory variables. y i , e x p and y i , p r e d are respectively the experimental and predicted values of property for molecule i and y ¯ e x p is the average value of the property for the training set.

• Todeschini’s parameter ( R c P 2 ) [

R c P 2 is the corrected form of P.P. Roy’s parameter noted R P 2 [

R c P 2 = R R 2 − R r 2 (17)

with R r 2 , the average value of R r i 2 of the models obtained with the randomized property.

• External validation coefficient ( Q e x t 2 ) [

It measures the accuracy of the prediction on the test set data.

Q e x t 2 = 1 − n n e x t PRESS ( test ) TSS (18)

here, n_{ext} refers to the number of test set compounds.

• Parameter (RMSEP) [

External predictive ability of QSPR model may further be determined by the Root Mean Square Error in Prediction given by:

RMSEP = ∑ ( y e x p ( test ) − y p r e d ( test ) ) 2 n e x t (19)

• Roy K. and al. parameters ( r m 2 ¯ and Δ r m 2 ) [

For the acceptable prediction, the value of Δ r m 2 should preferably be lower than 0.20 when the value of r m 2 ¯ is more than 0.50.

r m 2 ¯ = r m 2 + r ′ m 2 2 (20)

Δ r m 2 = | r m 2 − r ′ m 2 | (21)

here

r m 2 = r 2 ( 1 − r 2 − r 0 2 ) (22)

and

r ′ m 2 = r 2 ( 1 − r 2 − r ′ 0 2 ) (23)

The parameters r 2 and r 0 2 are the determination coefficients between the observed and predicted values of the compounds (training set or test set) with and without intercept, respectively. The parameter r ′ 0 2 bears the same meaning but uses the reversed axes.

• External validation criteria or “Tropsha’s criteria” [

There are five such criteria:

v Criterion 1: R e x t 2 > 0.70

v Criterion 2: Q e x t 2 > 0.60

v Criterion 3: | R e x t 2 − R 0 2 | R e x t 2 < 0.1 and 0.85 < k < 1.15

v Criterion 4: | R e x t 2 − R ′ 0 2 | R e x t 2 < 0.1 and 0.85 < k ′ < 1.15

v Criterion 5: | R e x t 2 − R 0 2 | < 0.3

where, R e x t 2 stands for the determination coefficient of molecules for the test set; R 0 2 represents the determination coefficient of the regression between predicted and experimental values for the test set without intercept; R ′ 0 2 is the determination coefficient of the regression between experimental and predicted values for the test set without intercept; k stands for the slope of the correlation line (values predicted according to the experimental values with intercept = 0) and k ′ is the slope of the correlation line (experimental values according to the predicted values with intercept = 0). Ouanlo Ouattara et al. [

• Lever (h_{ii}) [

The lever is a kind of distance from the barycentre of the points in the space of the explanatory variables. It identifies observations that are abnormally far from others. For observation i

h i i = x i ( X T X ) − 1 x i T ( i = 1 , ⋯ , n ) (24)

where x_{i} is the line vector of the descriptors of compound i and X is the matrix of the model derived from the values of the descriptors of the training set. The index T refers to the transposed matrix/vector. The critical value of lever h^{*} is, in

general, set to 3 ( p + 1 ) n [

training set and p is the number of model descriptors. If a compound has a residual and a lever that exceeds the critical value h* then this compound is considered outside the applicability domain of the developed model.

The descriptor considered in this work is electronic affinity (EA). This descriptor has been calculated according to Koopmans [

EA = − E LUMO (25)

where LUMO is the Lowest Unoccupied Molecular Orbital.

The calculated descriptor (electronic affinity) will be subject to selection criterion 1 because it is the lone considered descriptor (

The regression equation of the predictive QSPR (Quantitative Structure-Property Relationship) model of the first reduction potential dependent to electronic affinity (EA) is given below:

E t h e o 1 = − 2.5314 + 0.5708 ∗ EA

Training set | Test set | ||||
---|---|---|---|---|---|

CODE | EA | CODE | EA | CODE | EA |

TCNQ_1 | 4.7196 | TCNQ_16 | 4.7253 | TCNQ_31 | 4.7153 |

TCNQ_2 | 4.6219 | TCNQ_17 | 3.8607 | TCNQ_32 | 4.5776 |

TCNQ_3 | 4.5302 | TCNQ_18 | 3.8297 | TCNQ_33 | 5.0778 |

TCNQ_4 | 4.5376 | TCNQ_19 | 3.8085 | TCNQ_34 | 5.5402 |

TCNQ_5 | 4.2251 | TCNQ_20 | 4.9545 | TCNQ_35 | 4.7109 |

TCNQ_6 | 4.3299 | TCNQ_21 | 5.0348 | TCNQ_36 | 4.8048 |

TCNQ_7 | 4.4298 | TCNQ_22 | 4.3152 | TCNQ_37 | 3.5692 |

TCNQ_8 | 3.708 | TCNQ_23 | 5.2544 | TCNQ_38 | 4.5944 |

TCNQ_9 | 4.438 | TCNQ_24 | 4.4573 | TCNQ_39 | 4.9333 |

TCNQ_10 | 4.6023 | TCNQ_25 | 4.8206 | TCNQ_40 | 4.4605 |

TCNQ_11 | 4.5517 | TCNQ_26 | 4.8021 | ||

TCNQ_12 | 4.4094 | TCNQ_27 | 4.5060 | ||

TCNQ_13 | 4.5582 | TCNQ_28 | 3.3972 | ||

TCNQ_14 | 4.5253 | TCNQ_29 | 4.0640 | ||

TCNQ_15 | 4.8269 | TCNQ_30 | 4.4714 |

Equation | Coefficient of corrélation | R | | Descriptors is selected if | R | ≥ 0.50 |
---|---|---|

E e x p 1 = f ( EA ) | 0.9605 | selected |

n = 30 ; R = 0.9605 ; R 2 = 0.9225 ; R adjusted 2 = 0.9197 ; s = 0.0694 ; F = 333.3279 ; FIT = 0.3469 ; p -value < 0.000 ; TSS = 1.7407 ; ESS = 1.6058 ; α = 95 %

The positive sign of the coefficient of the EA in the regression equation of model shows that the first reduction potential increases with electronic affinity. There is therefore a direct correlation between the explanatory variable and the studied property. Examination of the above parameters shows that the correlation coefficient is very high ( R = 0.9605 ). This high value indicates that there is a strong correlation between the first reduction potential and the selected descriptor. The determination coefficient R 2 = 0.9225 shows that 92.25% of the experimental variance of the first reduction potential is explained by the model's descriptor alone. In addition, the standard deviation ( s = 0.0694 ) tends towards 0, indicating a good fit and high reliability of the prediction. The p-value is less than 0.0001 so 1 − α = 0.05 (5% risk). It is therefore clear that the regression equation of the model is highly significant for predicting the first reduction potential of the series of studied molecules. This global significance is confirmed by the very high Fischer value (F = 333.3279). Under these conditions, the only explanatory variable (electronic affinity) of the regression equation is very relevant to explain the studied property (first reduction potential). In addition, the experimental variance is TSS = 1.7407 when the theoretical variance due to the model is ESS = 1.6058. It is important to note that this relationship of dependence between the first reduction potential and electronic affinity has been corroborated by the work of Peter W. Kenny [

For internal validation, the Leave-One-Out (LOO) procedure and the property of the randomization test have been used.

• Leave-One-Out procedure

• Y-randomization test

The average values of the Y-randomization parameters are shown in

The external validation only concerns the molecules of the test set.

n | Q LOO 2 | r m 2 ( LOO ) ¯ | Δ r m 2 ( LOO ) | PRESS | SDEP |
---|---|---|---|---|---|

30 | 0.9136 | 0.9136 | 0.0000 | 0.1504 | 0.0708 |

Randomized parameter | R r 2 | s r | F r | R c P 2 |
---|---|---|---|---|

Average value | 0.0600 | 0.2415 | 1.9987 | 0.8920 |

n e x t | R e x t 2 | Q e x t 2 | R 0 2 | R ′ 0 2 | r m 2 ( test ) ¯ | Δ r m 2 ( test ) | RMSEP |
---|---|---|---|---|---|---|---|

10 | 0.9617 | 0.9504 | 0.9613 | 0.9617 | 0.9521 | 0.0096 | 0.0536 |

From the analysis of the data in

Verification of Tropsha’s criteria

Criterion 1: R e x t 2 = 0.9617 > 0.70

Criterion 2: Q e x t 2 = 0.9504 > 0.60

Criterion 3: | R e x t 2 − R 0 2 | R e x t 2 = 0.0004 < 0.1 and k = 0.9905 avec 0.85 < k < 1.15

Criterion 4: | R e x t 2 − R ′ 0 2 | R e x t 2 = 0.0000 < 0.1 and k ′ = 0.9797 avec 0.85 < k ′ < 1.15

Criterion 5: | R e x t 2 − R 0 2 | = 0.0004 < 0.3

At this level, we see that all five (05) Tropsha criteria are verified. As a result, the developed model is very efficient in predicting the first reduction potential of the series of studies molecules.

In

• Shapiro-Wilk’s test [

The data in

• Durbin-Watson’s test [

The values in

Shapiro-Wilk’s parameter (W) | p-value | 1 − α |
---|---|---|

0.9539 | 0.1036 | 0.05 |

Durbin-Watson’s parameter (d) | p-value | 1 − α |
---|---|---|

1.8705 | 0.3402 | 0.05 |

The Applicability Domain (AD) has been determined by analyzing Williams’s diagram of

The examination of the Williams diagram shows that for training and test set, all observations have their standardized residuals between ±3 standard deviation units (±3σ) [^{*} = 0.2000). In the case of the test set, it is observation TCNQ_34, which has its lever above the critical value. However, the value of a lever above the critical value does not always indicate an outlier for the developed model. Compounds of training set with levers above the threshold value with low residues stabilize the model and increase its accuracy. They are called “good influential points”. On the other hand, compounds with h_{ii} greater than the critical value h^{*} with large residues are called “bad influencing points” [

The objective of this study was to develop a predictive QSPR (Quantitative Structure-Property Relationship) model linking the first reduction potential from a series of tetracyanoquinodimethane molecules analogous to quantum descriptors from the conceptual density functional theory. A predictive QSPR model dependent to electronic affinity has been developed. The determination coefficient R 2 = 0.9225 of this model shows that 92.25% of the experimental variance of the first reduction potential is explained by the model’s descriptor alone. The Fisher coefficient of this model is very high ( F = 333.3279 ) indicating that the regression equation is highly significant. The standard deviations ( s = 0.0694 ) are well below 0.50 indicating a good fit and high reliability of the prediction. Regarding the parameters of the internal and external validations, they revealed that the model is validated and is assumed to predict efficiently the first reduction potential. The cross-validation coefficient Q LOO 2 = 0.9136 indicates that 91.36% of molecules of the training set have their predicted first reduction potential. Regarding the external validation coefficient, Q e x t 2 = 0.9504 , it shows that 95.04% of the test set molecules have their predicted first reduction potentials. Thus, to search for new tetracyanoquinodimethane (TCNQ) acceptors of this same family with the desired first reduction potentials, one can play on electronic affinity.

The authors declare no conflicts of interest regarding the publication of this paper.

Diarrassouba, F., Koné, M., Bamba, K., Traoré, Y., Koné, M.G.-R. and Assanvo, E.F. (2019) Development of Predictive QSPR Model of the First Reduction Potential from a Series of Tetracyanoquinodimethane (TCNQ) Molecules by the DFT (Density Functional Theory) Method. Computational Chemistry, 7, 121-142. https://doi.org/10.4236/cc.2019.74009