^{1}

^{1}

^{*}

Reliable prediction of lipophilicity in organic compounds involves molecular descriptors determination. In this work, the lipophilicity of a set of twenty-three molecules has been determined using up to eleven quantum various descriptors calculated by means of quantum chemistry methods. According to Quantitative Structure Property Relationship (QSPR) methods, a first set of fourteen molecules was used as training set whereas a second set of nine molecules was used as test set. Calculations made at AM1 and HF/6-311++G theories levels have led to establish a QSPR relation able to predict molecular lipophilicity with over 95% confidence.

The informations contained in molecular structure can be accessed and described by the mean of various physicochemical quantities named descriptors. For decades, many studies have been conducted to determine empirically or compute these descriptors and it is well known that they actually can describe molecular structures [

Both training and test sets are constituted from a sample of twenty-three aromatic compounds with known experimental values [_{exp}, where P_{exp} is the experimental value of octanol-water partition’s coefficient. The training set corresponds to fourteen molecules and test set, nine molecules (

All molecules have been fully optimized using GAUSSIAN 03 [

QSPR study needs a statistic analysis all along the validation process. In this work, we used the multiple linear regression analysis method [

The determination coefficient

TSS: Total Sum of Squares; ESS: Extended Sum of Squares; RSS: Residual Sum of Squares. A linear regression equation significancy is drawn from Fisher’s coefficient (F) [

Training set | ||
---|---|---|

Molecule | Code | logP_{Exp} |

CA1 | 2.13 ± 0.10 | |

CA2 | 3.12 ± 0.20 | |

CA3 | 3.15 ± 0.20 | |

CA4 | 3.69 ± 0.15 | |

CA5 | 3.63 ± 0.15 | |

CA6 | 3.53 ± 0.30 | |

CA7 | 4.00 ± 0.20 | |

CA8 | 4.10 ± 0.20 | |

CA9 | 4.00 ± 0.20 | |

CA10 | 3.22 ± 0.20 | |

CA11 | 2.27 ± 0.20 | |

CA12 | 2.73 ± 0.10 |

CA13 | 3.35 ± 0.10 | |
---|---|---|

CA14 | 3.87 ± 0.20 | |

CA15 | 3.98 ± 0.10 | |

CA16 | 3.66 ± 0.20 | |

CA17 | 3.60 ± 0.20 | |

CA18 | 3.63 ± 0.40 | |

CA19 | 3.05 ± 0.30 | |

CA20 | 3.20 ± 0.20 | |

CA21 | 4.10 ± 0.10 | |

CA22 | 3.15 ± 0.20 | |

CA23 | 4.10 ± 0.20 |

n: number of molecules; p: number of explanatory variables.

The predicting power of a model can be obtained from five Tropsha’s criteria [

Criterion 1:

Criterion 4:

There are thousands of molecular descriptors from the literature and quantum chemical calculations. For our study, we considered eleven quantum descriptors (

According to

Quantum descriptors | Notation | Expression |
---|---|---|

Dipolar moment | ||

Energy of the HOMO | ||

Energy of the LUMO | ||

Acidity by hydrogen bonding [ | ||

Basicity by hydrogen bonding [ | ||

Chemical elecrtonegativity [ | ||

Chemical hardness [ | ||

Chemical softness [ | ||

Smallestnegative charge of the molecule | ||

Larger positive charge of the hydrogenatoms of the molecule | ||

Sum of absolutes values of net electrical charges of Mulliken |

CODE | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

CA1 | 0.0009 | −0.3547 | 0.0204 | 0.0048 | 0.0052 | −0.1672 | 0.1876 | 5.3319 | −0.1301 | 0.1301 | 1.5614 |

CA2 | 0.4692 | −0.3375 | 0.0192 | 0.0048 | 0.0050 | −0.1592 | 0.1784 | 5.6070 | −0.1775 | 0.1296 | 2.0264 |

CA3 | 0.2453 | −0.3444 | 0.0194 | 0.0048 | 0.0051 | −0.1625 | 0.1819 | 5.4975 | −0.2072 | 0.1304 | 2.0942 |

CA4 | 0.2589 | −0.3441 | 0.0192 | 0.0048 | 0.0051 | −0.1625 | 0.1817 | 5.5051 | −0.2119 | 0.1306 | 2.4142 |

CA5 | 0.2977 | −0.3298 | 0.0186 | 0.0048 | 0.0049 | −0.1556 | 0.1742 | 5.7405 | −0.1777 | 0.1296 | 2.2557 |

CA6 | 0.4228 | −0.3382 | 0.0195 | 0.0048 | 0.0050 | −0.1594 | 0.1789 | 5.5913 | −0.2067 | 0.1297 | 2.3258 |

CA7 | 0.4717 | −0.3277 | 0.0199 | 0.0048 | 0.0049 | −0.1539 | 0.1738 | 5.7537 | −0.1796 | 0.1289 | 2.4884 |

CA8 | 0.0000 | −0.3246 | 0.0182 | 0.0048 | 0.0049 | −0.1532 | 0.1714 | 5.8343 | −0.1760 | 0.1292 | 2.4782 |

CA9 | 0.3123 | −0.3173 | −0.0086 | 0.0045 | 0.0048 | −0.1630 | 0.1544 | 6.4788 | −0.1795 | 0.1321 | 2.3463 |

CA10 | 1.5121 | −0.2948 | −0.0319 | 0.0043 | 0.0046 | −0.1634 | 0.1315 | 7.6075 | −0.1880 | 0.1410 | 2.0976 |

CA11 | 1.5754 | −0.3508 | 0.0060 | 0.0046 | 0.0051 | −0.1724 | 0.1784 | 5.6054 | −0.1657 | 0.1479 | 1.5913 |

CA12 | 0.2652 | −0.3429 | 0.0191 | 0.0048 | 0.0051 | −0.1619 | 0.1810 | 5.5249 | −0.1792 | 0.1301 | 1.7976 |

CA13 | 0.0003 | −0.3201 | −0.0098 | 0.0045 | 0.0048 | −0.1650 | 0.1552 | 6.4454 | −0.1278 | 0.1321 | 2.1093 |

CA14 | 0.2741 | −0.3155 | −0.0098 | 0.0045 | 0.0048 | −0.1627 | 0.1529 | 6.5424 | −0.1811 | 0.1325 | 2.3500 |

CODE | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

CA1 | 0.0000 | −0.3409 | 0.0424 | 0.0055 | 0.0038 | −0.1493 | 0.1917 | 5.2178 | −0.3387 | 0.3387 | 4.0650 |

CA2 | 0.6870 | −0.3202 | 0.0394 | 0.0055 | 0.0036 | −0.1404 | 0.1798 | 5.5617 | −1.6167 | 0.3402 | 13.3364 |

CA3 | 0.4144 | −0.3273 | 0.0387 | 0.0055 | 0.0037 | −0.1443 | 0.1830 | 5.4645 | −1.1851 | 0.3616 | 7.9439 |

CA4 | 0.4319 | −0.3266 | 0.0391 | 0.0055 | 0.0037 | −0.1438 | 0.1829 | 5.4690 | −1.1932 | 0.3730 | 9.4211 |

CA5 | 0.4248 | −0.3104 | 0.0397 | 0.0055 | 0.0035 | −0.1354 | 0.1751 | 5.7127 | −1.8709 | 0.3375 | 18.3793 |

CA6 | 0.7839 | −0.3187 | 0.0395 | 0.0055 | 0.0036 | −0.1396 | 0.1791 | 5.5835 | −1.6439 | 0.3713 | 15.3392 |

CA7 | 0.6708 | −0.3076 | 0.0396 | 0.0055 | 0.0035 | −0.1340 | 0.1736 | 5.7604 | −1.8444 | 0.3554 | 21.3145 |

CA8 | 0.0000 | −0.3026 | 0.0401 | 0.0055 | 0.0034 | −0.1313 | 0.1714 | 5.8360 | −2.8671 | 0.3183 | 24.3437 |

CA9 | 0.5046 | −0.2909 | 0.0392 | 0.0055 | 0.0033 | −0.1259 | 0.1651 | 6.0588 | −1.5820 | 0.3619 | 10.2537 |

CA10 | 1.7852 | −0.2624 | 0.0366 | 0.0055 | 0.0030 | −0.1129 | 0.1495 | 6.6890 | −0.5776 | 0.3718 | 6.2841 |

CA11 | 2.5200 | −0.3536 | 0.0383 | 0.0055 | 0.0040 | −0.1577 | 0.1960 | 5.1033 | −0.5641 | 0.3546 | 3.6156 |

CA12 | 0.4218 | −0.3274 | 0.0397 | 0.0055 | 0.0037 | −0.1439 | 0.1836 | 5.4481 | −1.3335 | 0.3521 | 8.7112 |

CA13 | 0.0000 | −0.2948 | 0.0387 | 0.0055 | 0.0034 | −0.1281 | 0.1668 | 5.9970 | −0.4731 | 0.3394 | 5.4112 |

CA14 | 0.3839 | −0.2894 | 0.0384 | 0.0055 | 0.0033 | −0.1255 | 0.1639 | 6.1013 | −1.7341 | 0.3853 | 10.2685 |

scriptors are Energy of the HOMO (

According to

Equation | Niveau AM1 | Niveau HF/6-311++G | |||
---|---|---|---|---|---|

Correlation coefficient | Rejected | Correlation coefficient | Rejected | ||

0.3173 | Rejected | 0.3551 | Rejected | ||

0.5727 | Selected | 0.6186 | Selected | ||

0.1127 | Rejected | 0.2126 | Rejected | ||

0.0600 | Rejected | 0.2126 | Rejected | ||

0.5641 | Selected | 0.6186 | Selected | ||

0.7228 | Selected | 0.6241 | Selected | ||

0.3572 | Rejected | 0.6122 | Selected | ||

0.2980 | Rejected | 0.5522 | Selected | ||

0.4134 | Rejected | 0.7340 | Selected | ||

0.4414 | Rejected | 0.1300 | Rejected | ||

0.9818 | Selected | 0.7060 | Selected |

Correlation between | AM1 level | |
---|---|---|

Coefficient | Criterion 2: Independent descriptors if | |

−97.3800 | Independent | |

0.8450 | Dependent | |

0.0269 | Independent | |

−0.0078 | Independent | |

−0.0003 | Independent | |

0.0124 | Independent |

settled two groups. For the first group 3, descriptors selected are Energy of the HOMO (

Correlation between | HF/6-311++G level | |
---|---|---|

Coefficient | ||

−100.00 | Independent | |

2.0533 | Dependent | |

−1.9416 | Independent | |

0.0572 | Independent | |

−0.0065 | Independent | |

0.0007 | Independent | |

−0.0205 | Independent | |

0.0194 | Independent | |

−0.0006 | Independent | |

0.00006 | Independent | |

−0.000007 | Independent | |

−0.9416 | Independent | |

0.0277 | Independent | |

−0.0034 | Independent | |

0.0004 | Independent | |

−0.0295 | Independent | |

0.0031 | Independent | |

−0.0003 | Independent | |

−0.0676 | Independent | |

0.0078 | Independent | |

−0.0985 | Independent |

been impossible with the software Excel to plot on a same graph

The quantum descriptors of group 2 were used for the establishment of Model 1 because they give a more significant regression equation in the sense of Fisher than group 1.

Model 1:

According to the statistical t_test, the importance of quantum descriptors in Model 1 is in the following descending order:

Verification of Tropsha criteria for Model 1.

1)

4)

All values satisfy Tropsha’s criteria. Model 1 is retained as predictive model of molecular lipophilicity. Statistical parameters are gathered in

Model 2:

According to the statistical t_test, the importance of quantum descriptors in Model 2 is in the following descending order:

Model 1parameters | Internal validation LOO (Training set) | External validation (Test set) | |||
---|---|---|---|---|---|

14 | 14 | 9 | |||

0.9729 (97.29%) | PRESS | 0.3716 | 0.9900 (99%) | ||

0.9647 | 0.9265 (92.65%) | PRESS | 0.1429 | ||

119.4556 | 0.9560 (95.60%) | ||||

0.1171 | 0.1928 | 0.1691 |

predict the aromatic compounds unavailable lipophilicities.

Verification of Tropshacriteria for Model 2.

1)

4)

All Tropsha criteria, excepted criterion 4, are not satisfied. Model 2 established at HF/6-311++G level is validated, since

Model 2 parameters | Internal validation LOO (Training set) | Validation externe (Test set) | |||
---|---|---|---|---|---|

14 | 14 | 9 | |||

0.8724 (87.24%) | PRESS | 2.5848 | 0.4006 (40.06%) | ||

0.6677 | 0.4884 (48.84%) | PRESS | 1.3086 | ||

10.9402 | 0.5971 (59.71%) | ||||

0.2839 | 0.5684 | 0.6605 |

QSPR methodology and quantum chemical methods were used to establish predictive models of molecular lipophilicity. In this work, we identified four groups of quantum descriptors according to the basic criteria usually used for descriptors selection. The results showed that many descriptors strongly correlate lipophilicity. From these descriptors, we have established two lipophilicity prediction models. The statistical analysis led us to select only the semi-empirical (AM1) based model. On the other hand, ab initio (HF/6-311++G) based model was rejected because of its low predictive power. Furthermore, the main descriptors that strongly influence the lipophilicity are, from of the selected model, the Basicity by hydrogen bonding (

Ouattara, O. and Ziao, N. (2017) Quantum Chemistry Prediction of Molecular Lipophilicity Using Semi-Empirical AM1 and Ab Initio HF/6- 311++G Levels. Computational Chemistry, 5, 38-50. http://dx.doi.org/10.4236/cc.2017.51004