Single Molecule Thermodynamics Hypothesis of Protein Folding and Drug Design

A single molecule view of protein folding leads to the concept of conformational Gibbs free energy function ( ; , ) G X E U , where the variable X is a conformation of a protein U . U and environments E (solvent, tempera-ture, pressure, other particles in solvent, etc.) in which U folds appear here as parameters. Changing environment E to ' E changes ( ; , ) G X E U to ( ; ', ) G X E U , this change can explain phenomena such as folding, unfolding, docking, etc., since this indicates that in different environments U has different stable conformations X E and ' X E , satisfying ( ; , ) G ∇ = X 0 E E U and ' ( , ', ) G ∇ = X 0 E E U respectively. The conformational Gibbs free energy function ( ; , ) G X E U also gives the deterministic part of folding force acting on an atom i a of U , ( ; , ) i i G = −∇ x F X E U . The equation of motion of folding is a Langevin equation. We suggest two verifiable predictions of the single molecule thermodynamic hypothesis that may confirm or refute the hypothesis: ab initio predictions of native structures of globular proteins and their folding paths. An application is suggested to drug design.


Introduction: A Single Molecule View of Protein Folding
To resolve the protein folding problem, that is: predicting the native structure and describing the folding dynamics, we must work with the fundamental physical law that directly governs protein folding process. That law is the Thermodynamic Principle of Protein Folding [1], it is just the Second Law of Thermodynamics since Anfinsen and others already shown that the folding process is spontaneous. In the protein folding case, the second law is that the Gibbs free Therefore, we have to figure out what is the Gibbs free energy. The question is, is there a Gibbs free energy function whose variables are all possible conformations of a given protein molecule? Or is it only a Gibbs free energy difference between the folded ensemble of protein molecules and its counterpart, the unfolded ensemble? The former is a single molecule view coming from contemplating a protein molecule comes out of ribosome and changes its conformations until it achieves its native structure; the latter is an ensemble view coming from staring at a tube of purified protein solution and trying to figure out the collective behaviours of the protein molecules in the solution while the solution is going towards equilibrium.
Via quantum statistics applied to a tiny thermodynamic system X S which is tailor made for the conformation X and its immediate physiological environ- Another big question in protein folding is that is there a folding force? Leventhal in 1969 [5] has shown by contradiction that there must be a folding force, otherwise if the folding process were only random, it would have taken a time span longer than the Earth's age.
The CGFE function also gives us the deterministic part of the folding force i F acting on an atom i a of U . It is A scientific hypothesis has to give verifiable predictions to let people confirm or refute it. We suggest two verifiable predictions of the single molecule thermodynamic hypothesis.
1) ab initio predictions of native structures of globular proteins: the native structure X U of the protein U is a (local or global) minimizer of the CGFE function: ( ; , ) min ( ; , ), for some neighbourhood of .
guarantees that this problem has a solution, i.e., mimimizers always exist, the real task is to find them and determine which is the native structure, a hard programming problem.
2) For a globular protein U , starting from any initial conformation 0 X , there is a folding path 0 ( ) ( ; ) t t = X X X satisfying 0 0 0 ( ; ) t = X X X , and for 0 t t ≥ , the following Langevin equation: Here i η is the solvent friction. The random force ( ) i t F is caused by occasionally bumping into another non-solvent molecule. Because of it, we do not have a completely deterministic folding path. Again, mathematical theorems guarantee that such folding path 0 ( ; ) t X X exists. Moreover, mathematics also tells us that it is highly depending on its initial conformation 0 X , so an important issue is to know the protein's initial conformation as it is out of ribosome.
If the two predictions are positively verified, then we can say that theoretically the protein folding problem is resolved, at least for globular proteins.

The CGFE Function for Globular Proteins
To explain our CGFE function, we start with a conformation X . A protein U consists of n atoms 1 ( , , ) n  a a , a conformation of U can be expressed as a indicates it has been congruently moved to the nuclear position i x . i C exists, will change with X , see [6]. A very good approximation to ( ) In natural, and even in most of artificial environments of protein folding, the immediate environment of a globular protein U is just one layer of water molecules surrounding the conformation . This is true even the protein molecule is inside a crystal [7]. P X plus the one layer water molecules consists of a tiny thermodynamic system X S , tailor made for P X . As an open thermodynamics system, X S has a Gibbs free energy ( ) Between P X and the layer of water molecules, is an interface M X , for example, the solvent accessible surface P ∂ X . The expression of ( ) G X S is via global The space containing water molecules in X S , the ring Figure 1.
The interface M X then is decomposed accordingly , .
Since water molecules and electrons can enter or leave Since P X is fixed, here the kinetic energy operator T X vanishes.   (8) and (10) into (7), the conformational Gibbs free energy function of a globular protein U in its physiological The function is smooth, i.e., the first and second derivatives exist in 3n

Explanations of Protein Folding
A scientific hypothesis has to be able to explain natural and artificial phenomena. We will explain several phenomena of protein folding, unfolding, and docking, and suggest an application to drug design, according to the SMTH.

What the CGFE Function in (11) Reveal?
It is well known ( [8] and [9]) that native structures of globular proteins have is also shrinking towards the native structure.
According to the SMTH, predicting native structure of a globular protein is to minimize the CGFE function (11) as in (1) and (2). As analysed above, it is essentially making an ever better hydrophobic core by shrinking volume, area, and hydrophobic area simultaneously and cohesively. This is the "cooperativity" searched in [10], of "the concurrent participation of different regions of the biomolecule to promote and sustain intramolecular or intermolecular interactions".
One may ask that where are hydrogen bonds in (11)? The answer is that secondary structures and hydrogen bonds are products of minimizing the CGFE function to find the native structures. A judicious examination of an exhaustive PDB sample of small soluble globular proteins of moderate size (residues 102 N < ) showed that the hydrophobic collapsing is coupled with backbone hydrogen-bond formation [11]. In [12], we neglected the volume and area, only shrank the hydrophobic surface area (equivalent to make better hydrophobic core), hydrogen bonds, secondary structures such as α helices, β strands, and β turns, duly appeared with statistical significance.
The explanation is: proteins in their physiological environment are special among polymers. Polymers do not have specified structures, proteins have native structures in their physiological environment. Why? Globular proteins' peptide chains are special, folding in their physiological environment, while collapsing to hydrophobic cores the residues are putting just in places to be able to form secondary structures and hydrogen bonds simultaneously. Evolution selects the very few peptide chains to be foldable globular proteins. In fact, randomly picking a 400 residue peptide chain, the probability that it is a protein's peptide chain is at most 10 −460 [2]. In any computer, 10 −460 is zero.

Explanation of Denaturation
According to the pioneer research of denaturation [13], all known denature phenomena are caused by environment change from U E to some E . Denaturation, or unfolding, is the same as folding, only in a different environment E and with a different CGFE function ( ; , ) G X E U . According to the SMTH, the unfolding will end at minimizers X E 's (may not be unique, local or global), of ( ; , ) G X E U .
Experiments show that the difference between folding and unfolding is that folding leads to a unique native structure, unfolding leads to many different sta-  (1)) such that any initial conformation in A Y will fold to Y . Because of evolutional selection for a protein U 's peptide chain made it fit in the protein's physiological environment U E , the attractive basin of X U is large enough to contain all the initial conformations freshly come out of ribosome, thus even ( ; , ) G X U E U has more than one minimizer, the native structure is the unique folding result.
But is not in denaturation, the initial conformation is the unique native structure X U ? Why it unfolds to many stable denatured conformations? It is because catastrophe, a phenomenon often happens in nature, a description of it is given in [14]: "Catastrophe theory is concerned with the mathematical modelling of sudden changes-so called 'catastrophes'-in the behaviour of natural systems, which can appear as a consequence of continuous changes of the system parameters".
Actually, from physiological environment U E to the denaturation environment E , there must be a family of environments t E connecting them. Except U E and E , these environments are not in equilibrium or quasi-static so the function ( ; , ) t G X E U is not well defined. Hence, although the parameter t varies continuously, catastrophe does happen so that various copies of the same native structure X U suddenly changed to different structures and when the environment finally changes to the denaturation environment E , these changed structures became different initial conformations of denaturation paths under ( ; , ) G X E U .

Explanation of Docking
Docking is trying to bound two molecules to form a stable complex.
Let U and V be two molecules (proteins or others), and ∈ X U X and ∈ Y V X . The 3-dimensional conformations P X and P Y are contained in their tailor made thermodynamic systems X S and Y S respectively in their common physiological environment U E . In Figure 2, in the beginning, = ∅