_{1}

To resolve the protein folding problem, that is: predicting the native structure and describing the folding dynamics, we must work with the fundamental physical law that directly governs protein folding process. That law is the Thermodynamic Principle of Protein Folding [

Therefore, we have to figure out what is the Gibbs free energy. The question is, is there a Gibbs free energy function whose variables are all possible conformations of a given protein molecule? Or is it only a Gibbs free energy difference between the folded ensemble of protein molecules and its counterpart, the unfolded ensemble? The former is a single molecule view coming from contemplating a protein molecule comes out of ribosome and changes its conformations until it achieves its native structure; the latter is an ensemble view coming from staring at a tube of purified protein solution and trying to figure out the collective behaviours of the protein molecules in the solution while the solution is going towards equilibrium.

Via quantum statistics applied to a tiny thermodynamic system S X which is tailor made for the conformation X and its immediate physiological environment E U , we have derived the conformational Gibbs free energy (CGFE) function G ( X ; E U , U ) for globular proteins U [

Applying CGFE function we translate the thermodynamic principle of protein folding into the Single Molecule Thermodynamic Hypothesis (SMTH) of Protein Folding: putting a protein molecule U in an environment E , a stable conformation X E (may not be unique) of U must be a minimizer (local or global) of the CGFE function G ( X ; E , U ) . In particular, the gradient vanishes at X E , ∇ G ( X E , E , U ) = 0 .

Another big question in protein folding is that is there a folding force? Leventhal in 1969 [

The CGFE function also gives us the deterministic part of the folding force F i acting on an atom a i of U . It is F i ( X ) = − ∇ x i G ( X ; E U , U ) .

A scientific hypothesis has to give verifiable predictions to let people confirm or refute it. We suggest two verifiable predictions of the single molecule thermodynamic hypothesis.

1) ab initio predictions of native structures of globular proteins: the native structure X U of the protein U is a (local or global) minimizer of the CGFE function:

G ( X U ; E U , U ) = min X ∈ U ⊂ X U G ( X ; E U , U ) , forsomeneighbourhood U of X U . (1)

Especially, since G ( X ; E U , U ) is smooth, ∇ G ( X U ; E U , U ) = 0 . In case X U is a global minimizer, then

G ( X U ; E U , U ) = min X ∈ X U G ( X ; E U , U ) . (2)

Here X denotes a conformation and X U is the set of all possible conformations of U .

Therefore, for a globular protein U (we know the formula of G ( X ; E U , U ) ), the prediction of native structure is reduced to a pure mathematical problem, the minimization problem of a known smooth function. A mathematical theorem guarantees that this problem has a solution, i.e., mimimizers always exist, the real task is to find them and determine which is the native structure, a hard programming problem.

2) For a globular protein U , starting from any initial conformation X 0 , there is a folding path X ( t ) = X ( t ; X 0 ) satisfying X ( t 0 ; X 0 ) = X 0 , and for t ≥ t 0 , the following Langevin equation:

m i d 2 x i ( t ) d t 2 = F total = − ∇ x i G ( X ( t ) ; E U , U ) − η i d x i ( t ) d t + F i ( t ) , i = 1, ⋯ , n . (3)

Here η i is the solvent friction. The random force F i ( t ) is caused by occasionally bumping into another non-solvent molecule. Because of it, we do not have a completely deterministic folding path. Again, mathematical theorems guarantee that such folding path X ( t ; X 0 ) exists. Moreover, mathematics also tells us that it is highly depending on its initial conformation X 0 , so an important issue is to know the protein’s initial conformation as it is out of ribosome.

If the two predictions are positively verified, then we can say that theoretically the protein folding problem is resolved, at least for globular proteins.

To explain our CGFE function, we start with a conformation X . A protein U consists of n atoms ( a 1 , ⋯ , a n ) , a conformation of U can be expressed as a point of 3n-dimensional space ℝ 3 n , X = ( x 1 , ⋯ , x n ) , where x i ∈ ℝ 3 is the nuclear position of a i in the 3-dimensional space ℝ 3 . Not all points in ℝ 3 n are conformations of U , bond lengths, bond angles, and van der Waals distances in general, are natural constraints. Denote X U as the set of all possible conformations of U . A conformational function of U is a function f : X U → ℝ . For example, all force fields used in molecular dynamics simulations are conformational functions.

The 3-dimensional conformation P X of U is P X = ∪ i = 1 n C i ( x i ) ⊂ ℝ 3 , where C i ⊂ ℝ 3 is the shape of the atom a i in U , C i ( x i ) indicates it has been congruently moved to the nuclear position x i . C i exists, will change with X , see [

In natural, and even in most of artificial environments of protein folding, the immediate environment of a globular protein U is just one layer of water molecules surrounding the conformation P X = ∪ i = 1 n B ( x i , r i ) . This is true even the protein molecule is inside a crystal [

P X plus the one layer water molecules consists of a tiny thermodynamic system S X , tailor made for P X . As an open thermodynamics system, S X has a Gibbs free energy G ( S X ) . The CGFE function G ( ⋅ ; E U , U ) : X U → ℝ then is defined as G ( X ; E U , U ) = G ( S X ) .

Between P X and the layer of water molecules, is an interface M X , for example, the solvent accessible surface ∂ P X . The expression of G ( S X ) is via global geometric features of M X and its surface chemical potentials. A protein molecule has many moieties or atom groups, some are charged, some are polar, others are non-polar. They can be classified into hydrophobicity classes H i , 1 ≤ i ≤ H , H > 1 , from most hydrophobic (non-polar) to the most hydrophilic (polar or charged). An atom a i ∈ H j if it belongs to a moiety of class H j . Define P X , i = ∪ a j ∈ H i B ( x j , r j ) , then P X = ∪ i = 1 n P X , i . The space containing water molecules in S X , the ring ℜ X = S X \ P X ¯ , is decomposed into H parts (not necessarily connected) via the distance function d i s t ( x , P X , i ) = min y ∈ P X , i | x − y | , see

ℜ X , i = { x ∈ ℜ X : d i s t ( x , P X , i ) ≤ d i s t ( x , P X , j ) , forany j ≠ i } , i = 1, ⋯ , H . (4)

The interface M X then is decomposed accordingly

M X = ∪ i = 1 H M X , i , M X , i = M X ∩ ℜ X , i . (5)

A water molecule in ℜ X , i will touch M X , i , so it will be attracted ( H i is charged or polar) or repulsed ( H i is non-polar) by P X . So the same water molecule in different ℜ X , i has different chemical potentials μ i . For non-polar H i , μ i > 0 , for charged or polar H i , μ i < 0 . Thus there is a 1 < k < H , such that

μ 1 > μ 2 > ⋯ > μ k > 0 > μ k + 1 > ⋯ > μ H . (6)

Since water molecules and electrons can enter or leave

Since

Since every water molecule in

Let

The

Let

Let

The function is smooth, i.e., the first and second derivatives exist in

A scientific hypothesis has to be able to explain natural and artificial phenomena. We will explain several phenomena of protein folding, unfolding, and docking, and suggest an application to drug design, according to the SMTH.

It is well known ([

Hence folding towards native structure, the volume and area are going to shrink. Then the first two terms in (11) must have positive coefficients, i.e.,

As the hydrophobic core, look at (11), the hydrophobic surfaces

According to the SMTH, predicting native structure of a globular protein is to minimize the CGFE function (11) as in (1) and (2). As analysed above, it is essentially making an ever better hydrophobic core by shrinking volume, area, and hydrophobic area simultaneously and cohesively. This is the “cooperativity” searched in [

One may ask that where are hydrogen bonds in (11)? The answer is that secondary structures and hydrogen bonds are products of minimizing the CGFE function to find the native structures. A judicious examination of an exhaustive PDB sample of small soluble globular proteins of moderate size (residues

The explanation is: proteins in their physiological environment are special among polymers. Polymers do not have specified structures, proteins have native structures in their physiological environment. Why? Globular proteins’ peptide chains are special, folding in their physiological environment, while collapsing to hydrophobic cores the residues are putting just in places to be able to form secondary structures and hydrogen bonds simultaneously. Evolution selects the very few peptide chains to be foldable globular proteins. In fact, randomly picking a 400 residue peptide chain, the probability that it is a protein’s peptide chain is at most 10^{−460} [^{−460} is zero.

According to the pioneer research of denaturation [

Experiments show that the difference between folding and unfolding is that folding leads to a unique native structure, unfolding leads to many different stable conformations [

Either in folding or unfolding, a conformation moves along a folding (unfolding) path which satisfies an equation of motion, the Langevin Equation (3), with deterministic forces

Moreover, the initial conformation of the folding or unfolding path also determines where the path ends. Any local minimizer

But is not in denaturation, the initial conformation is the unique native structure

Actually, from physiological environment

Docking is trying to bound two molecules to form a stable complex.

Let

binding if and only if 1)

If

If

In general, even

We address a question asked in [

In fact, a drug

The author declares no conflicts of interest regarding the publication of this paper.

Yi, F. (2019) Single Molecule Thermodynamics Hypothesis of Protein Folding and Drug Design. Journal of Biosciences and Medicines, 7, 164-172. https://doi.org/10.4236/jbm.2019.711015