A Revisited Definition of the Three Solute Descriptors Related to the Van der Waals Forces in Solutions

It is currently admitted that the intermolecular forces implicated in Gas Liquid Chromatography (GLC) can be expressed as a product of parameters (or descriptors) of solutes and of parameters of solvents. The present study is limited to those of solutes, and among them the three ones are involved in the Van der Waals forces, whereas the two ones involved in the hydrogen bonding are left aside at this stage. These three studied parameters, which we call δ, ω and ε, respectively reflect the three types of Van der Waals forces: dispersion, orientation or polarity strictly speaking, and induction-polarizability. These parameters have been experimentally obtained in previous studies for 121 Volatile Organic Compounds (VOC) via an original Multiplicative Matrix Analysis (MMA) applied to a superabundant and accurate GLC data set. Then, also in previous studies, attempts have been made to predict these parameters via a Simplified Molecular Topology procedure (SMT). Because these last published results have been somewhat disappointing, a promising new strategy of prediction is developed and detailed in the present article.


Introduction
The present study takes place at the continuation of a series of six papers started in 2005 by our group, in which we have focused our efforts on a better knowledge of the intermolecular forces involved in GLC (Gas-Liquid Chromatography) between solutes (VOC or Volatile Organic Compounds) and stationary phases [1]- [6]. Even if it is an oversimplified view on a theoretical point of view, two types of forces are considered from an experimental approach: • Van der Waals, subdivided in London, Keesom, and Debye types, • hydrogen bonding, subdivided in proton donor and proton acceptor types.
According to Rohrschneider [7], it is admitted since 1966 that the Kováts retention indices in GLC (RI) can be expressed as a linear equation of terms, each term being a product of a solute parameter and of a solvent parameter. There is presently a general agreement to consider that five terms are necessary and sufficient. In other words, the following equation can be written:

Parameters of Solutes
The physicochemical characterization of our solute parameters is partially similar to that adopted by various authors starting with the study of Abraham et al. in 1990 [8]. In both cases one parameter is dependent of the molecular size, whereas the four other parameters are of polar nature in the extended meaning of the term, i.e. independent of the molecular size. We call δ (as dispersion) the apolar solute parameter and we have found that it was better identified to the molecular polarizability (or the molar refractivity) than, for example, to the molar mass, the molar volume or the partition coefficient air hexadecane.  Three out of our four polar solute parameters are very similar to those of Abraham and co-authors: those we are calling ε, α and β. By contrast, our ω parameter (as orientation or polarity strictly speaking) is quite different of the corresponding parameter according to Abraham and co-authors (equations have however been provided to transform one system into the other one [1]). The acronyms δ, ω, ε, α and β for the five obtained solute parameters are respectively related to the forces of: • dispersion (London), • orientation or polarity strictly speaking (Keesom), • polarizability-induction (Debye), (ε as electrons involvement), • acidity (proton donor according to Brønsted), • basicity (proton acceptor according to Brønsted).
Based on a strong cooperation between our group and the Kováts group, accurate values of solute parameters have solely been established applying an original algorithm presently called MMA (as Multiplicative Matrix Analysis) to a set of retention indices of 127 compounds on 11 phases [1] [4].
An improvement of this step has been the reduction of the number of needed stationary phases without losing accuracy. The most satisfactory result has been obtained with two apolar phases of very different molecular weight and same structure, and three polar phases, respectively poly-fluorinated, polyether and primary mono alcoholic. Because these five phases have been synthesized by the Kováts group and are not commercially available, a procedure has been proposed to calibrate the commercial phases using six VOC references [4]. Once sets of five phases from the available commercial ones would be selected as independent enough in their D, W, E, A and B values, it would be in principle easy to develop measurements of solute parameters exclusively based on experimentation, potentially more refined.
In 1982 [9] we tried such experimental determination 100% GLC for 240 solutes, but the set of the five chosen phases was not orthogonal enough, and we are therefore considering that study as only of a preliminary interest.
It should be relatively easy to reach the above mentioned goal using filled columns.
The purpose could however be harder with open tubular columns, particularly for the proton donor phase and the two apolar phases of very different molecular weight, as suggested, for example, by the results of Poole et al. [10] [11].
The 100% GLC determination of solutes parameters not seemingly being anymore developed, an alternative way has been followed consisting in the pooling of those presently available with those published by Abraham and co-authors [12] [13]. In this step the Abraham data have been suitable transformed as mentioned above, and those in gas or solid state at room temperature have been excluded. This data pool, less accurate than the previous one, was supposed to have the interest to include homogeneous values for 456 compounds of more diversified nature [3].

Solvent Parameters of Stationary Phases
In the above mentioned step, simultaneously to the solute parameters for 127 com-pounds, were obtained the accurate values of solvent parameters named D, W, E, A and B for 11 stationary phases.
On another hand, from the important collection of GLC retention indices published by Mc Reynolds in 1970 (207 stationary phases on 226 columns and 10 VOC) [14], we identified the molecular structures of 56 of these phases and derived their solvent parameters D, W, E, A and B. Because of an abnormal behavior of the phase diglycerol, observed in various studies and seemingly due to its high surface adsorption [5] [15]- [18], we have discarded this phase in the present study.
As for solutes, we pooled these two data sets into a more extended data set of 66 phases (56 -1 + 11) [5].
We added in this pooled data set, according to [1] and [14], the values of the parameter called b by Mc Reynolds [14], allowing to transform volumes retention Vg into Kováts retention indices RI. Indeed, this b parameter can be considered as a fifth characteristic of phases, together with W, E, A and B, the parameter D being a constant when the GC data are expressed in relative values (which is the case for RI).

On the Parameters Related to the Hydrogen Bonding Forces
The results are clear and contrasted. On one hand their identification is almost evident (to a solute proton donor property correspond a solvent proton acceptor property and reciprocally). On the other hand their values predicted on the basis of a simplified molecular topology (which is summarized in the Material and Methods section of the present paper) fail to be satisfactory each time intramolecular hydrogen forces are suspected. The solution could perhaps be to extend the molecular topology presently applied, to a larger neighboring of a given atom. However, that solution would need considerably more experimental data than the presently available ones.

On the Parameters Related to the Van der Waals Forces
The results are more satisfactory. Out of the three molecular parameters of solutes, those of dispersion δ and of induction-polarizability ε are clearly related to equations including the molar refractivity and the Van der Waals molar volume, both being easily and accurately predictable, even for solids and gases, using a simplified molecular topology. Similarly, out of the solvent parameters of phases, the parameters b of McReynolds and E have appeared related to equations including the Van der Waals molar volume and the PSA (polar surface area), both being easily and accurately predictable, even for solids and gases, using a simplified molecular topology. By contrast, the cases of the orientation or polarity strictly speaking parameter of solutes ω, and of its associated solvent parameter W have remained open.
The present study, therefore, is limited to intent an optimal characterization of the solute parameters related to the Van der Waals forces, those related to the hydrogen bonding forces being (temporarily?) left aside.

Statistical Tools
In addition to the Microsoft Excel Windows facilities for drawing diagrams and handling data sets, the SYSTAT 12® for Windows has been applied for stepwise MLRA (Multidimensional Linear Regression Analysis).

SMT, A Simplified Molecular Topology
The principle of this tool has already presented elsewhere [2] [4]. In the version used here, it only takes into account, for each atom of a molecule, its nature and the nature of its bonds, leaving aside the nature of its first neighbors with the exception of three cases specified hereafter. Each atom is provided with an index comprising a series of digits. Their sum is at most equal to its valence. The value of the digits define the type of bonds (1 for a single, 2 for a double bond, etc.), but the bonds with hydrogen are excluded.
In addition to the 34 atom characteristics kept, we also consider three additional topological features: • Chlorine linked to carbon C11.
• Amines linked to (a carbon linked to O 2 ). This feature is present in amides.
• A connectivity parameter due to Zamora [19] called the "smallest set of smallest rings" (SSSR). According to this concept, for the naphthalene for example, which contains two individual C-6 rings and one C-10 ring embracing them, only the two six numbered rings are considered. Two six numbered rings corresponding to 12 carbon atoms, the SSSR value of naphthalene is therefore be taken equal to 12. In Table 1 are summarized the topological features finally kept in the present version, concerning the atoms C, H, O, N, P, S, F, Cl, Br, I. Let us specify that the calculations using the SMT procedure have been made manually in this study, using 2D molecular drawings from Chem Spider [19] [20].

Solute and Solvent Parameters Data
As shown in the Introduction, four sets of solvation parameters have been established along the last decade: two for VOC as solutes and two for GLC stationary phases as solvents [1]- [6]. For solvents, both become 100% from GLC experimentation: one can be considered as accurate (11 phases), and the other one more extended (66 phases) as less precise.
For solutes, one set exclusively comes from GLC experimentation and can be considered as accurate (127 compounds), and the other one, only partially from chromatographic origin and less precise, covers a greater variety of VOC (456 compounds). Some compounds have been excluded from these two initial experimental data sets of solutes: • Those which include Si or Sn, • Those which include a given atom only linked to hydrogen (e.g. CH 4 , OH 2 , NH 3 , SH 2 ), • Those in gas or solid state at room temperature. Finally, the number of VOC kept in these two data sets under study becomes respectively 121 for the accurate one and 447 for the extended one.
These four data sets thus specified are reported in the Supplementary Material, excluding the parameters α, β, A and B, which are involved, as we saw, in the hydrogen bonding forces.

Polar Surface Area
The polar surface area of a molecule (PSA) is currently defined as the surface sum over all polar atoms, primarily oxygen and nitrogen, also including their attached hydrogens [21]. PSA has successfully been applied in 1996 and 1997 by Palm et al. to reflect the molecular transport properties of drugs, particularly blood-brain barrier (BBB) penetration and intestinal absorption [21] [22]. As already pointed out in 2007 by Ertl [23], the number of publications in which PSA is involved has been growing exponentially since these early publications. These interesting results have tended to consider PSA as a strictly pharmacological property not applicable in the physicochemical field as a general criterion of polarity, since, for example, strongly polar elements such as halogens, particularly fluorine, have always been excluded of its definition. We have however recently shown that PSA, associated with the Van der Waals molecular volume Vw, appears strongly implicated in the characterization of one of the polarities observed in GLC: the McReynolds b para-meter already quoted [5].
Few authors also include sulphur and phosphor in an alternative definition of PSA [24] [25]. We have suggested in 2013 a slight modification to the initial definition extended to O, S, N, P, with the exception of hexavalent S and pentavalent N and P [6].
That was based on the fact that it is difficult to consider, on a theoretical point of view, hexavalent S and pentavalent N and P as polar atoms. Indeed, in these cases, all their peripheral electrons are involved in bonds. We proposed to call PSA2 this alternative definition in 2013, and we successfully applied the corresponding molecular characteristic to an olfactory study [6].
A second purpose of the present study is therefore an attempt to elucidate what appears, at the first view, as a contradiction: is it PSA a pharmacological property strictly speaking or a more general tool characterizing the molecular polarity? That involves a clarification, on experimental basis, about its most suitable definition among the various ones already proposed: is it or not a unique one valid in all circumstances?
In their early applications [21] [22], PSA values have been established using sophisticated programs taking into account the molecular three-dimensional shape and its flexibility. However, a very simple topological method using summation of surface contributions of polar fragments (termed TPSA) has been applied by Ertl et al. [25], exhibiting an excellent correlation with theoretical PSA values (r = 0.991, N = 34,810 substances).
In the present study, the TPSA values have been collected, for the extended set of 447 VOC, from Molinspiration [26]. For the stationary phases we adopted the atomic parameters according to Laffort [5].

Molar and Molecular Volumes
The various expressions supposed to be reflecting the "intrinsic molecular volume" or the "Van der Waals molecular volume" are all additive properties (which it is not the case for the ratio molar mass/density at 20˚C). We have selected among then in the present study, the values of molecular volumes (expressed in cubic angstroms) proposed by the freely interactive calculator of Molinspiration [26]. The authors of this calculator have used, in a first step, a semi-empirical quantum chemistry method to build 3D molecular geometries for a training set of about 12,000 molecules. In a second step, they have fitted sum of fragment contributions to the supposed real volumes of the training set. We name this expression Vw (as Van der Waals volume).

Molar Refractivity (MR and MR W )
These properties are the measure of the total polarizability of one mole of a given com- in which M, d and n stand respectively for the molar mass, the density at 20˚C and the refractive index at 20˚C. Initiated by Abraham et al. [8], a modified expression of the molar refractivity can be obtained replacing V 20 in equation 2 by one or another expression supposed to be reflecting the "intrinsic molecular volume" or the "Van der Waals molecular volume". We found in 2005 [1] that this modified expression of the molecular refractivity is the more suitable to reflect the apolar solubility parameter of solutes δ in GLC. As shown in Equation (5), we have here chosen V W for that, and called MR W the modified molar refractivity: The molar refractivity values are generally expressed in ml•mol −1 .

The Solute Parameter of Orientation or Polarity Strictly Speaking
Applying a multiple linear regression analysis (MLRA) to the ω values of the 121 VOC data set, as possible functions of the molecular features of Table 1, produces the predictive model reported on the left of Figure 2 and the corresponding correlogram on the right one. By contrast, a similar attempt with the extended data set of 447 VOC was so disappointing than those already published [2] [3].
As a reminder, F ratio value can be obtained from the correlation coefficient r, the number of observations N (here the number of VOC) and the number of independent variables, according to Abdi [27]:  Obviously, the quality of the prediction differs according to the considered statistical test; r or F (both statistical tests are supposed to be as high as possible).
Most statisticians often prefer the second criterion, particularly when the number of independent variables increases in large proportion relatively to the number of observations.
As a result, we consider for further uses, and perhaps as a first approach, the SMT model reported in Figure 2 as a definition of the ω parameter (which we will call ω 2016 ) in spite of the smaller diversity of molecular functions involved included in its definition (more precisely, ω 2016 = 100 times the SMT predicted values of experimental ω values). An interesting observation is the discarding of the molecular feature N122 (present in nitrates) by the MLRA program, confirming our previous hypothesis that the pentavalent N is non polar [6].
Let us specify that in Figure 2, the [NCO] molecular feature is not present in any of the 121 VOCs under study. It has however been added (between brackets), because when the MLRA is applied to the less precise data set of 447 VOC using as independent variables the same 37 features of Table 1 plus the predicted ω according to the initial model of Figure 2, the unique feature added by the MLRA with a partial F ratio >20, is NCO.

Short Recalling and Updating of the Well Established Characterization of the Solute Parameters δ and ε
The best molecular property well established, related to the solute parameter of dispersion δ obtained via an experimental GLC, has been found to be the molar refractivity we have named MR w in Equation (5) [1]. Because the refractive index is not always available in liquid state, we propose to call henceforward δ 2016 , the values obtained using the SMT model of Figure 3, which will be valid whatever the states of VOCs under study at room temperature are. More precisely: The theoretical induction-polarizability parameter ε theoret is obtained via a bilinear relationship between the molar refractivity and the molar volume, such as its values equals zero for n-alkanes, according to the following equation:

The Solvent Parameters McR b, W and E Equations
No significant improvements have been found in the present study, comparatively to [5], in the characterization of these three parameters using well established molecular properties or/and the SMT procedure. We can therefore consider as temporary valid the observations mentioned in this previous paper, which can be summarized as follows: • the three parameters appear partially related, in addition to other molecular features, to PSA, • each molecular feature concerned, including PSA, appears involved in a ratio of this feature to the molar volume V w , on the contrary of what is observed for solute parameters. In other words, the various types of solvent polarities appear in some way, as densities of polarity, • more precisely, the parameters McR b and E can be expressed via two different equations including both PSA/V W and 1/V W . At this stage, there is no new argument concerning the inclusion or not of other polar elements than N and O in the definition of PSA. By contrast, the observation already made [6] that the pentavalent N does not seem to be a polar feature is confirmed in the present study.

Solute Parameters
The results on solute parameters presented here, compared to those previously published on the same topic, show real improvements, as summarized in Table 2.
Let us recall that the difference in the N values between 2008 and the present study for δ and ε not at all results from an elimination of outliers in the MLRA of data, but from objective facts detailed in the Materials and Methods section. Concerning ω, the important difference value of N is principally interpreted as a lack of accuracy in the more extended data set, as we already saw.
The improvements observed in the present study are principally due to a pragmatic approach summarized in Figure 1: • starting with only accurate and superabundant experimental data submitted to an MMA; • physicochemical characterization when possible, with molecular properties well established for a large amount of compounds, as it is the case, for example, for V W , MR W and PSA; • if it is successful, applying SMT to these well established properties, in order to overcome some inappropriate situations (e.g. refraction index for solids and gases); this step has been applied to δ 2016 and ε 2016 ; • if it is unsuccessful, applying SMT to chromatographic accurate and less numerous data (this step has been applied to ω 2016 ).
It should be of interest to note the high mutual independence of these parameters, as it can be seen in Figure 4, and similarly previously observed for the initial data set of 125 solute parameters of 100% experimental GLC origin [1].

Solvent Parameters
The absence of improvements for solvent parameters is very probably due to a too small amount of accurate and superabundant chromatographic measurements. One of the consequences is the uncertainty about possible presence of polar elements in the definition of PSA, other than N and O. The only conclusive fact seems to be the absence  of pentavalent N and P and of hexavalent S.

General Comments
The present study does not provide any improvements in the separation of components of solutions nor the identification of such components, both being the principal followed purposes in chromatographic sciences. On the other hand, the results obtained in this study and in some other ones we have previously conducted in a similar way, would not be reached without the strongly help of GLC experimentation. We have presently the key to determine with simplicity and a relatively not too bad accuracy, the three parameters, for solutes, of Van der Waals forces in solutions. And we strongly believe that these labile forces could be implicated in some pharmacological and chemo sensorial properties. Our next effort will be to test this hypothesis. It should be difficult to conclude this study without quoting an important recent review on the "Determination of solute descriptors by chromatographic methods" by Poole et al. [28], and regret, at this juncture, some lack of mutual understanding between the majority of the authors working in this topic and our group. That, among others, concerns the no using of MMA as an appropriate tool to process GLC superabundant data sets (with the exception of Ulrich [29]). One consequence is the supposed "impossibility to reflect independently the interactions associated with induced and stables dipoles" [28]. In fact and in spite of a questionable letter to the editor in 2006 [30] (short answer in [3]) and an abundant exchange of correspondence with Michael Abraham, it has been demonstrated since 2005 [1] that a strictly speaking polarity descriptor of solutes can be expressed on the solely basis of S, R and B, according to the acronyms and the data of Abraham and co-authors. In addition, it is proposed in the present study a definition of such polarity descriptor, termed ω 2016 , uniquely based on some molecular features as we saw above. The validation of one or another ways, or possible crossed ones, could be based in the future on comparative performances in QSAR and QSPR.
Concerning the possible objection against our ω 2016 model, because only based on 121 solutes, it should be noted that this data set includes numerous chemical functions: alcohols, aldehydes, ketones, ethers, nitro-compounds, nitriles, secondary and tertiary amines, esters, various types of halogen and sulfur compounds and hydrocarbons, both of saturated or unsaturated types, with or without mono and polycycles… are however absent (and present in the set of 447 solutes) primary amines, carboxylic acids, amides and lactones. Also, substances with more than one chemical function are sparingly represented in this 121 data set. As a conclusion, the strictly speaking polarity descriptor proposed here, of purely pragmatic and not at all of theoretical nature, could be considered as a first approach, and possibly improvable in the future with the help, among others, of semi empirical quantum chemical methods already applied in this topic (e.g. [16] [31]).