Development of Predictive QSPR Model of the First Reduction Potential from a Series of Tetracyanoquinodimethane (TCNQ) Molecules by the DFT (Density Functional Theory) Method

In this work, which consisted to develop a predictive QSPR (Quantitative Structure-Property Relationship) model of the first reduction potential, we were particularly interested in a series of forty molecules. These molecules have constituted our database. Here, thirty molecules were used for the training set and ten molecules were used for the test set. For the calculation of the descriptors, all molecules have been firstly optimized with a frequency calculation at B3LYP/6-31G(d,p) theory level. Using statistical analysis methods, a predictive QSPR (Quantitative Structure-Property Relationship) model of the first reduction potential dependent on electronic affinity (EA) only have been developed. The statistical and validation parameters derived from this model have been determined and found interesting. These different parameters and the realized statistical tests have revealed that this model is suitable for predicting the first reduction potential of future TCNQ (tetracyanoquinodime-thane) of this same family belonging to its applicability domain with a 95% confidence level. Molecules

Intramolecular electron transfer processes are one of the main topics of current interest in physic organic chemistry [5], particularly regarding tetracyanoquinodimethane (TCNQ)-based charge transfer complexes. In fact, TCNQ is an organic electron acceptor with a high electron affinity [6] [7] [8]. This electron acceptor can react according to an oxidation-reduction process with electron donors to form charge transfer complexes that display electrical properties and various applications. Indeed, it has been used for the synthesis of a large number of charge transfer compounds that have been widely explored as molecular electronics building blocks [9] [10], non-linear optics [11] and organic semiconductors [12] [13]. Existing TCNQ molecules have generally exhibited exemplary redox properties. Improving their properties and finding molecules with even better properties is therefore a challenge for scientific research. However, in the synthesis of these complexes, the objective of organic chemists is to synthesize thermodynamically stable radical species, which is not an easy task. Also, the two molecules constituting the charge transfer complex must have moderate donor and acceptor powers [14]. Under these conditions, the use of alternative methods for experimentation becomes essential. Among these, QSPR (Quantitative Structure-Property Relationships) methods are of great interest and even recommended according to new regulations [15] [16]. They make it possible to develop mathematical models linking physico-chemical properties with molecular structure. They either explain the origin of these properties or predict them for the molecules whose experimental data are not available. Quantum chemistry provides access to a large number of descriptors through its different methods.
The objective of this work is to develop a predictive QSPR model of the first reduction potential from a series of TCNQ molecules using quantum descriptors, to explain and predict the first reduction potential of the future TCNQ molecules of this same family belonging to its applicability domain.

Training Set and Test Set
In the development of the predictive QSPR model of the first reduction potential, we considered a series of forty Tetracyanoquinodimethane derivatives codified TCNQ [17]- [23]. The choice of these molecules is due to the availability of their experimental first reduction potentials. These properties have been all determined by cyclic voltammetry in acetonitrile. These molecules have constituted our database. Thirty of which (75% of the database) were used for the training set and ten molecules (25% of the database) were used for the test set. Table 1 presents these different molecules with their corresponding experimental first reduction potentials expressed in volts (V). TCNQ_15 +0.320 [17]   TCNQ_22 −0.010 [19]

Computational Theory Level and Softwares
The GaussView 5.0 [24] software was used to represent the 3D structure and visualize the studied molecules. Then, the Gaussian 09 software [25]

Statistical Analysis
To develop a QSPR model, a data analysis method is required. This method quantifies the relationship between the studied property and the molecular structure (descriptors). There are several methods for the implementation of a model and the analysis of its statistical data. But the one we used in our study is Simple Linear Regression (SLR) (a single explanatory variable). Generally speaking, the equation of the simple regression is of the form: with Y standing for the studied property, X represents the explanatory variable in correlation with the studied property and 0 1 , a a are the model regression constants.
The selection of descriptors is a crucial step in QSPR modeling. In this study, the selection of descriptors was based on two criteria described as follows:  Criterion 2 The descriptors must be independent from one another. To do this, the partial correlation coefficient ij a between the descriptors i and j must be less than 0.70 [30]. For a multilinear regression, the coefficients R and ij a are expressed as follows: The relationships (4), (5), (6) and (7) were used to calculate many statistical and validation parameters: where TSS is total sum of squares, ESS stands for extended sum of squares and RSS is residual sum of squares.
The determination coefficient is given by the following relationship:  Standard deviation (s) [32] It is an indicator of dispersion. It provides information on how the distribution of data is performed around the average. The closer its value is to 0, the better the adjustment and the more reliable will be the prediction.
( ) 2 , , It allows to measure the robustness of a model unlike 2 R . This coefficient is used in multiple regressions because it considers the number of descriptors parameters of the model.
It allows to test the global significance of linear regression. A globally significant regression equation contains at least a relevant explanatory variable to explain the dependent variable. The Fisher-Snedecor coefficient is related to the determination coefficient by the following relationship: It measures the size or robustness of the model. The smaller the FIT, the more robust the model is, meaning that the model has more variables.
( ) It measures the accuracy of the prediction on the data of the training set  Cross-validation criteria (PRESS) [36] As  R is the corrected form of P.P. Roy's parameter noted 2 P R [38]. It allows to know if the model is due to chance correlations or not. If this parameter is greater than 0.50, the model is not due to a chance correlations. It is defined as: here, n ext refers to the number of test set compounds.
 Parameter (RMSEP) [39] External predictive ability of QSPR model may further be determined by the Root Mean Square Error in Prediction given by: The parameters 2 r and 2 0 r are the determination coefficients between the observed and predicted values of the compounds (training set or test set) with and without intercept, respectively. The parameter 2 0 r′ bears the same meaning but uses the reversed axes.  External validation criteria or "Tropsha's criteria" [36] [41] There are five such criteria:  R′ is the determination coefficient of the regression between experimental and predicted values for the test set without intercept; k stands for the slope of the correlation line (values predicted according to the experimental values with intercept = 0) and k′ is the slope of the correlation line (experimental values according to the predicted values with intercept = 0). Ouanlo Ouattara et al. [42] reported that if at least 3/5 of the Tropsha's criteria are verified, the QSPR model developed is considered as a successful model in predicting of the studied property.  Lever (h ii ) [43] The lever is a kind of distance from the barycentre of the points in the space of the explanatory variables. It identifies observations that are abnormally far from others. For observation i

Calculation of Molecular Descriptor
The descriptor considered in this work is electronic affinity (EA). This descriptor has been calculated according to Koopmans [45] approach: the electronic affinity is the opposite of LUMO energy.
LUMO EA E = − (25) where LUMO is the Lowest Unoccupied Molecular Orbital. Table 2 reports the values of this descriptor for both the training set and the test set.

Submission of the Descriptor to the Selection Criterion 1
The calculated descriptor (electronic affinity) will be subject to selection criterion 1 because it is the lone considered descriptor (Table 3).

QSPR Model
The regression equation of the predictive QSPR (Quantitative Structure-Property Relationship) model of the first reduction potential dependent to electronic affinity (EA) is given below:   There is therefore a direct correlation between the explanatory variable and the studied property. Examination of the above parameters shows that the correla- ). This high value indicates that there is a strong correlation between the first reduction potential and the selected de-

Internal Validation of the Model
For internal validation, the Leave-One-Out (LOO) procedure and the property of the randomization test have been used.  Leave-One-Out procedure  the prediction of the redox potential, the model is acceptable. Moreover, to ensure that the model is not due to chance correlations, the Y-randomization test of the property has been realized. A circular permutation of the property has been made (29 iterations).  Y-randomization test The average values of the Y-randomization parameters are shown in Table 5. Table 5 shows that the average value of 2 r R tends to 0 ( 2 0.0600 ). This confirms that the established model is not due to chance correlations.

External Validation of the Model
The external validation only concerns the molecules of the test set. Table 6 reports the statistical parameters of the external validation of the model.

Correlation between the Predicted Values by the Model and the Experimental Values
In Figure 1, all points tend to approach the regression line. This figure therefore shows a strong linear correlation between the predicted values of the first reduction potential by model and the experimental values. As for Figure 2, it shows that the predicted values by the model and the experimental values evolve in a similar way, particularly for the test set. Thus, these graphs confirm that the model is validated and is very efficient in predicting the redox potential. This reflects the adequacy of the theory level used to develop this model.

Model Normality Tests
 Shapiro-Wilk's test [48] The data in Table 7 shows that the calculated p-value is greater than 1 0.05 α − = (5% threshold). Thus, the theoretical values of the first reduction potential obtained from the model follow a normal distribution law. This normal distribution is confirmed by the distribution of the point cloud according to the first bisector in Figure 3.  Durbin-Watson's test [49] The values in Table 8 show that the calculated p-value is greater than 1 0.05 α − = (5% threshold). It is therefore clear that the residues are not autocorrelated (zero correlation). Under these conditions, these residues do not contain information that can influence the model's prediction of the first reduction potential. This interpretation is confirmed by the random distribution of the point cloud in Figure 4.

Applicability Domain (AD) of the Model
The Applicability Domain (AD) has been determined by analyzing Williams's diagram of Figure 5.  The examination of the Williams diagram shows that for training and test set, all observations have their standardized residuals between ±3 standard deviation units (±3σ) [50]. This justifies the absence of outliers. The choice "3 units of standard deviation" was made because our data follow a normal distribution law. Indeed, for leverage effect, a value of 3 is commonly used as a limit value for accepting predictions because the points between ±3 standard deviation units cover on average 99% of the data that follow a normal distribution law [51]. With regard to the levers of the training set, except for the observation TCNQ_28, all the others have their levers below the threshold value (h * = 0.2000). In the case of the test set, it is observation TCNQ_34, which has its lever above the critical value. However, the value of a lever above the critical value does not always indicate an outlier for the developed model. Compounds of training set with levers above the threshold value with low residues stabilize the model and increase its accuracy. They are called "good influential points". On the other hand, compounds with h ii greater than the critical value h * with large residues are called "bad influencing points" [51]. As a result, our elaborate QSPR (Quantitative Structure-Property Relationship) model does not show any evidence of aberrant observation of molecules in either set. The molecule TCNQ_28 is a "good influence point". The results of the external validation showed that the model is suitable for predicting future redox potentials of TCNQ of this same family belonging to its applicability domain.

Conclusion
The objective of this study was to develop a predictive QSPR (Quantitative Structure-Property Relationship) model linking the first reduction potential from a series of tetracyanoquinodimethane molecules analogous to quantum descriptors from the conceptual density functional theory. A predictive QSPR model dependent to electronic affinity has been developed. The determination coefficient 2 0.9225 R = of this model shows that 92.25% of the experimental variance of the first reduction potential is explained by the model's descriptor