Thermodynamic Principle Revisited : Theory of Protein Folding

Anfinsen’s thermodynamic hypothesis is reviewed and misunderstandings are clarified. It really should be called the thermodynamic principle of protein folding. Energy landscape is really just the mathematical graph of the Gibbs free energy function ( ) X; , N G U E , a very high dimensional hyper surface. Without knowing it any picture of the Gibbs free energy landscape has no theoretical base, including the funnel shape claims. New insight given by newly obtained analytic Gibbs free energy function ( ) X; , N G U E of protein folding derived via quantum statistical mechanics are discussed. Disputes such as target-based or cause-based; what is the folding force, hydrophobic effect or hydrophilic force? Single molecule or ensemble of molecules to be used for the statistical physics study of protein folding, are discussed. Classical observations of 1970’s and 1980’s about global geometric characteristics of native structures of globular proteins turn out to have grabbed the essence of protein folding, but unfortunately have been largely forgotten.


Introduction 1.The Second Law of Thermodynamics
The Second Law of Thermodynamics states that in an isolated system the entropy will increase.For a spontaneous process in a system of constant temperature T , pressure P , and composition, the equivalent statement of the second law of thermodynamics states that the Gibbs free energy will be lowered, and at the new equilibrium state it will be at a minimum.Indeed, this applies to any process or any chemical reaction with constant temperatures and pressure [1].The Gibbs free energy has the form G U PV TS = + − , where U is the internal energy, P the pressure, V the volume, T the temperature, and S the entropy, of the system.Even T and P are not uniform in the system, moreover even they are not well defined in the system, the available free energy 0 0 A U PV T S = + − will be lowered and goes to minimum as long as the heat bath of the system has well defined temperature and pressure and on the boundary of the system they are constants 0 T and 0 P respectively [2].This fundamental principle was known long before.

Anfinsen's Thermodynamics Hypothesis of Protein Folding
A good example of applying this fundamental principle to lift experimental results to guiding theory of further research is Anfinsen's Thermodynamic Hypothesis of protein folding [3].After many years of experimental work proved that the refolding process of ribonuclease is spontaneous, Anfinsen summarised: "The studies on the renaturation of the fully denatured ribonuclease required many supporting investigations to establish, finally, the generality which we have occasionally called the 'thermodynamic hypothesis'.This hypothesis states that the three-dimensional structure of a native protein in its normal physiological milieu (solvent, pH, ionic strength, presence of other components such as metal ions or prosthetic groups, temperature, and other) is the one in which the Gibbs free energy of the whole system is lowest; that is that the native conformation is determined by the totality of the inter atomic interactions, and hence by the amino acid sequence, in a given environment."[3].
Once we know that protein folding is a spontaneous process, the thermodynamic hypothesis should be upgraded to thermodynamic principle.The thermodynamic principle makes the protein folding problem a pure physics problem, all biological knowledge needed is how to specify the physiological environment for a particular protein and how to reasonably simplify the environment for the further study.
But this has not been recognised so far, instead, it is thought that in biological problems such as protein folding theoretical consideration is unpractical.Coincidently, in the early 1970's, the same time when the thermodynamic principle of protein folding was established by Anfinsen, computer entered research and played more and more important role in protein folding research.Theory was neglected, simulations became essential as if they were experiments, but many cannot satisfy the essential requirement to experiments in experimental sciences, the reproducibility, see [4].Furthermore, theoretical background justification of these simulations were rarely questioned.One wonders that if the increasing computer power were really guided by the thermodynamics principle, perhaps today the mystery of protein folding phenomenon would not be really a mystery anymore.

Reasons of the Thermodynamic Principle of Protein Folding Is Neglected
Why the thermodynamics principle were not actively and persistently pursued?

Do Not Believe It
First, some do not think that the thermodynamic principle is correct.For example, in [5] it is claimed that Anfinsen's theory was disapproved for long time because "other complexities of biological systems for example solvents of different compositions may affect the folding/unfolding of proteins, the role of high dielectric constant of water, chaperone assisted folding of proteins and existence of stable folding intermediates." All the reasons listed above to "disapprove" the thermodynamic principle belong to neglecting that in the thermodynamic principle of protein folding environment plays the same important role as the peptide chain of a protein.In fact, Anfinsen never claimed "that the primary amino acid sequence of polypeptides contains all of the necessary information to direct their folding into functional native proteins" [6].Instead, Anfinsen stressed that "in a given environment".Solvents of different compositions, particular properties of water, chaperones to assist folding, etc., are constitutes of environment.For example, globular proteins have the simplest environment which can be simplified as only water molecules surrounding a protein molecule.For proteins needing chaperones to assist folding, chaperone molecules must appear in the environment.For membrane proteins, the environment must be described as including three layers, the middle one is hydrophobic, and the other two are mainly water molecules.Some proteins would not fold in environments not including certain constitutes, does not disapprove the thermodynamic principle, rather the proteins are not in their "normal physiological milieu".This is just one example that the generality of the thermodynamic principle is often misunderstood and then thought as wrong.

Misunderstandings Caused by Energy Landscape "Theory"
Second, the main reason of the thermodynamic principle was not pursued enough is because of confusions Y. Fang caused by the energy landscape "theory", both the EL (potential energy landscape) and GEL (Gibbs energy landscape) "theories".Indeed to minimise the Gibbs free energy one should have a Gibbs free energy function

( )
; , N G X U E , where the variable X is a conformation of the protein U , and the parameter N E is the physi- ological environment in which the protein is folding.Although many have tried to derive ( ) ; , N G X U E , for example, [7], all failed.Without knowing ( ) ; , N G X U E , landscape "theories" take place, such as the GEL "theory".In fact, the GEL "theory" really has no theory, and in principle cannot explain anything [8].All its formulae for calculating Gibbs free energy are ad hoc, without any theoretical base.Terms such as random energy formula, minimally frustrated principle, are only borrowings from other field without discussion of justification, see [8].Especially, although proteins fold in a fixed environment, GEL has several different temperatures to be used to calculate Gibbs free energies of different conformations, [9], something wrong in principle.That is, if one invents a theory to explain a natural phenomenon, one cannot add something that is not in the natural phenomenon.
In fact, the GEL is just the graph of ( ) ; , N G X U E , a very high dimensional hyper-surface (the dimension n is at least more than 200 for a 100 residue peptide chain) .Advocators of GEL fully understand this, for example, in [10], it is stated that "In the filed of physical chemistry, the energy landscape of a protein-solvent system is defined as an energy function ( ) ( ) , where 1 2 , , , n x x x  are variables specifying the protein microscopic states".The GEL "theory" trying to produce pictures of the very high dimensions hypo-surface.Of course, nobody can penetrate the inhibiting high dimension of this hypo-surface, trying to show it as a two dimension surface give many misleading metaphors such as "funnel shaped", cause more confusion than understanding.
Thus, if we know ( ) ; , N G X U E , depicting its graph only make trouble, we really do not need the GEL.If we do not know what is ( ) ; , N G X U E , GEL "theory" only makes ad hoc calculations of the Gibbs free energy of various conformations without theoretical base and without consistency.No wonder it caused more misunderstanding than understanding of the thermodynamic principle of protein folding.
For example, in [1] it is claimed that pursuing of the thermodynamic principle (equivalented to GEL) leads to pitfalls, and the thermodynamic principle will not help to solve the protein folding problem [5], [11].

Lack of Mathematical Training
One of the main reasons of GEL, in fact, the thermodynamic principle, will not help solving the protein folding problem is that the second law of the thermodynamics cannot guarantee that the Gibbs free energy ( ) will have a global minimum [6].Even though we do not have explicit formula, a little mathematical knowledge will help clarify this situation.For example, a lower semi-continuous function can always achieve its minimum on a compact set is a theorem in mathematics.Since any conformation's diameter is uniformly bounded, the definition domain of X certainly is contained in a compact set in higher dimensional Euclidean space.It is hard to imaging that a energy function is worse than lower semi-continuous, hence ( ) must have a global minimum.Besides, recently an analytic formula ( ) for monomeric globular proteins was derived via quantum statistics, [12]- [15], and this function is certainly continuous, see (6).In [6], to refute the funnel shape claim of GEL by suspecting the existence of global minimum is not a good argument.

Should the Native Structure Be Only a Local Minimum?
Another main reason of GEL will not help solving the protein folding problem is that the native structure of a protein maybe is only at a local minimum, instead of the global minimum of Gibbs free energy.This is possible, but in circumstances not against the second law of thermodynamics, hence will not negate the thermodynamic principle.In this case, the initial conformation will determine which minimal points will be achieved by the native structure.We should know more of the conformation of newly synthesised poly peptide chain in a cell.Is it alway the same conformation or does it vary with each individual molecule?If it is the former, then starting with other initial conformations may lead to local or global minima different to the native structure.If it is the later, then perhaps the native structure is really the unique global minimum, since starting from all initial conformations lead to the same native structure.Judged from the experiment results of ribonuclease denaturation/renaturation, denatured ribonuclease still hold about 1% biological function.Since there are 105 patterns of fulfilling the 4 disulphide bonds, we may infer that perhaps each of the 105 patterns has the similar percentage in the denatured state.Yet, all these initial conformations refold to the same native structure in which the protein has 100% biological function.Thus we can infer that for ribonuclease the ( ) really has a unique global minimum.Since this was the entire knowledge Anfinsen had been known, so he hypothesised that "lowest" Gibbs free energy.Considering certain initial conformation leads to certain minimiser, local or global, modifying the thermodynamic principle to admit local minimum will not harm the principle.

Mistaking Environment
A really legitimate concern about the thermodynamic principle is argued in [16].It says that "According to this hypothesis, if we define N G as the Gibbs free energy G of a folded protein in its native state N, N G is the global minimum of the protein's free energy functional Ĝ .However, N G can only be reached if N is the current equilibrium state for the native thermodynamic conditions ( ) . We describe the condition set ( ) , where X T and X P are the equilibrium pressure and temperature for a protein in a state X with the conformation X C .We assume constant T and P and use the only microscopic solvent composition X Q to define the present conditions for X.The Anfinsen's thermodynamic hypothesis, therefore, seems to make sense.Indeed, from the Second Law (at constant T and P), a free energy change relax 0 should be obtained for any thermodynamic pathway to relax the non-equilibrium state X to the folded native equilibrium state N with the respective free energies ( ) N C and N Q are the native conformation and solvent composition, respectively.The possible pitfall is that X G  is a non-equilibrium free energy because X G  is not at equilibrium for N Q .The real free energy change that has to be considered in a pathway where an intermediate state X has enough time to reach equilibrium is is the equilibrium free energy for X Q .The Anfinsen's thermodynamic hypothesis can, therefore, only hold with a good likeliness if The point is that the solvent composition X Q is really varying with the conformation X C , i.e., there is not any common solvent composition N Q , see the next section on the formula ( ) ; , N G X U E , where X Q is the first water layer of the conformation X and is part of the thermodynamic system X  for each conformation X , and ( ) is really the Gibbs free energy of the thermodynamic system X  .As for equilibrium, the protein folding process should be considered as a quasi-static process, and as mentioned before, we only need the heat bath has a constant temperature 0 T and pressure 0 P , and in this case the second law of thermodynam- ics is that the available energy 0 0 A U PV T S = + − is getting its minims as stated in the beginning according to [2].
After clarifying these suspicions on the thermodynamics principle, we will demonstrate what is the ( ) ; , N G X U E and look at some new insight it gives.

The Formula ( )
X; , N G U E We will not give the derivation of ( ) ; , N G X U E , which was done in [12]- [15], with the same idea but progressively better understanding of the quantum physics.We concentrate on the rationals coming from the understanding of the thermodynamic principle of protein folding.
Our understanding of the thermodynamic principle is that it emphasises holistic view, it requires a single molecule method and quantum statistics instead of classical statistics to derive the Gibbs free energy formula ( )

Cannot Be a Sum of Local Contributions
Unlike the potential energy function, the Gibbs free energy function, or, the GEL, is not pairwise additive as has been pointed in [6].In fact, we cannot first consider local contributions and then sum them up to get the Gibbs free energy.This is emphasised by Anfinsen in the statement "that is that the native conformation is determined by the totality of the inter atomic interactions, and hence by the amino acid sequence, in a given environment."[3].So that when trying to derive ( ) by the first principle, we cannot divide X into several parts, consider each part, and sum up Gibbs free energies of all these parts.In fact, we even cannot take a coarse grained model of conformation to try to derive ( ) ; , N G X U E , because an atom's contribution to the whole cannot be separated.Hence for us, a conformation is the atomic centres' coordinates of all atoms , , , N  a a a of the given protein U , witting as , , , For proteins with more than one poly peptide chains, all chains should be consider together, i.e., let ( )

Single Molecule Treatment Is Necessary
Like any computer simulation of protein folding, we describe only one protein molecule in various conformations X , not an ensemble of (the same) protein molecules each taking a conformation.To derive ( ) ; , N G X U E , it is nature that one needs adopt the statistical physics.When applying statistics, naturally one thinks that there should be many copies of the same object, such as a protein molecule, to form an ensemble.This was pursued by many, see for example, [7], where integrations on all molecular conformations of the ensemble except one X was performed to get ( ) E X .With a not integrable integrand (in fact, it is in exponential form) and without clear delimitation of the integral domain, the obtained formula ( ) E X is a complicated unknown function buried in multi-dimensional integral.Worse still, ( ) E X is even not the Gibbs free energy.Nevertheless, the authors of [7] called it the effective energy and used it in many places as if is following the thermodynamic principle.This is a perfect example of starting from the thermodynamic principle and end up with metaphor expressions and endless computer simulations to cover up theoretical poverty.Following this trend, the protein folding problem will never be resolved.
Since one of the tasks of protein folding problem is to figure out the individual protein's native structure, but in an ensemble of molecules, all available methods are actually neglecting the structures of individual molecule, we cannot use the ensemble method.Therefore, we have to take a single molecule U , consider an arbitrary conformation X of it, and to figure out a thermodynamic system X  which is tailor made for the conforma- tion X and contains the immediate surrounding environment of X .Finally, the Gibbs free energy ( ) is the Gibbs free energy of the thermodynamic system X  .

Classical or Quantum Statistics?
We have to figure out how to do statistics on this thermodynamic system X  .Both classical and quantum sta- tistics were tried, see [12], with the classical result missing the volume contribution in formula (6).Consider that classical mechanics does not fit to describe physics of objects of molecule and even macro molecule size, we choose quantum statistics.Moreover, quantum statistics will allow we do further theoretical studies of protein folding, if we can handle the electronic density function defined in [17], which we cannot do presently.This function holds the origins of our hydrophobicity level classification in subsection 2.6.

The Importance of Environment
Our tailor made thermodynamic system X  in fact contains the immediate surrounding environment of the conformation X .Biological knowledge comes here to help us describe and make necessary and rationale simplifications of this immediate environment.For example, it is known for globular proteins, we can simply assume that in the physiological condition only water molecules immediately surrounding a conformation.For membrane proteins, the immediate environment should have at least three parallel layers, water molecules in the outer two layers and the middle is hydrophobic.For proteins needing chaperones' help to fold, these chaperones must be contained in the immediate environment of the conformation X .This is also the holistic view, without the chaperones, the protein is in a wrong environment and will either fold to another structure, or no structure at all, meaning many different conformations achieve the minimum of Gibbs free energy.
Since except for monomeric globular proteins, we have not figured out how to handle environment, our present function is only for monomeric globular proteins.

The Thermodynamic System X  for Monomeric Globular Proteins
Since only globular proteins allow us to simplify their physiological environment as consisting of only water molecules, we will only work on monomeric globular proteins here.A conformation of a polymeric globular protein is n > .Theoretically, our function ( ) can be generalised without any theoretical difficulty to polymeric globular proteins.But to apply it we will face the docking problem, i.e., con- and the relative positions between i X  's.This is too hard for now.
First, a conformation X by definition lives in 3N   , but the textbook definition of a thermodynamic system is that it is a region in 3   , see, for example, [18].To create X  , we consider , where ( ) x is solid ball of radius r and centred at 3 ∈  x .Essentially the P X together with its first layer of water molecules will be our X  .Here we assumed that each atom i a 's shape is a solid ball with van der Waals radius i r .Although the shape of each atom in U is well defined by the theory of atoms in molecules [17], what concerning us here is the overall shape of the structure P X .The cutoff of electron density 0.001 au ρ ≥ in [17], gives the overall shape of a structure that is just like P X , a bunch of overlapping balls.Moreover, the boundary of the 0.001 au ρ ≥ cutoff is almost the same as the molecular surface M X which was defined by Richards in 1977 [19] and in 1992 and 1993, [20] and [21] was shown to be a more suitable boundary surface of P X than other surfaces .To explain the formula ( ) ; , N G X U E , we have to describe X  in details.In general, any closed surface (connected, bounded, and has no boundary, for example, a sphere) 3 S ⊂  will divide 3   into three parts, 3 , , where S Ω is a bounded domain (connected open set) and S ′ Ω a un-bounded domain, S ∂Ω means the boun- dary of S Ω .Rolling a sphere of radius r on the boundary surface P ∂ X of P X will produce a molecular surface ( ) r M X [19].Let w d be the diameter of a water molecule and denote the molecular surface ( ) 2, , i m =  , each is large enough to hold a water molecule), then denote ( ) ( ) Let be the first hydration shell surrounding P X .Then the tailor made thermodynamic system for the conformation

Hydrophobicity Levels
Any Gibbs free energy formula should not only have fairly general form for all proteins, or at least a class of proteins such as monomeric globular proteins, but also must be able to distinguish different proteins.That means that if 1 U and 2 U are two different proteins with the same number of atoms, say N .Then even 3N ∈  X simultaneously appears as conformations of both 1 U and 2 U , have the same minimum values, 1 U and 2 U should have different natives structures, if they have native structures at all.
Hence, we should find a way to distinguish proteins by their peptide chains.The hardest task is that given a peptide chain 1 n P A A =  , let i n be the number of amino acid i appears in P , shuffle the amino acid i A 's around we will have ( ) different amino acid sequences, the formula ( ) has to be able to distinguish all of these ( ) S P peptide chains.For example, there should be ( ) S P different minimisation problems in (9), though the minimisers may vary just slightly for some of them.
To this purpose, we divide atoms in a protein according to their hydrophobicity levels.Atoms in a protein molecule are naturally existing in atom groups or moieties which have different physicochemical properties.One of these properties is the electronic charge distributions caused the tendency of forming hydrogen bonds either with other moieties (intra-molecular) or with other molecules in the environment (inter-molecular).Accordingly, we can divide these atom groups or moieties into different levlels of hydrophobicity, from the most hydrophobic (cannot form hydrogen bond) to the most hydrophilic, say there are H levels 1 , , H H H  ,

2
H ≥ .Then we can assign an atom k a into one hydrophobic level i H if k a belongs to an i H atom group.For example, we may assume that the classification is as in [22], there are 5 H = classes, C, O/N, O -, N + , S. Unlike in [22], we also classify every hydrogen atom into one of the H hydrophobicity level groups.Note that this classification is independent of conformations, it only depends on the peptide chain.
For any compact (closed and bounded) set x z be the distance between the point x and the subset U .Define compact sets ( ) where \ i P P X X is the set of points x that belong to P X but do not belong to i P X .
To each hydrophobic level i H , there is a chemical potential i µ , such that a water molecule touching i M X will gain a Gibbs free energy i µ .Similarly, there is a chemical potential e µ for electrons inside X  .Let i ν be the average number of water molecules that can simultaneously touching i M X in a unit area, then i i i ω ν µ = will be chemical potential per unit area of i M X .Moreover, since the curvature of M X is uniformly bounded for all conformations X , i ω 's do not depend on conformations X .

The Formula ( )
V Ω be the volume of ( ) x be a conformation.Let A q be the electronic charges in the nucleus of A a .

Structure Prediction
With theoretically established ( ) ; , N G X U E , ab initio structure prediction not only becomes possible, but also simple.It is a pure mathematical problem of seeking the minimisers of ( ) The problem is, nobody knows eq P , as Ben-Naim admitted in [24].One suggestion is that we apply ( ) ( ) The rational is that any conformation X with will have much smaller ( ) eq ; , N P X U E , thus less chance to appear in the full function state of the ensemble.Of course, this is only a conjecture and it is not so important to know, at least not as a claim made in [24]: "If one knew this distribution, then one could tell which conformations are more probable than the others under the given environment."In fact, we now all know that in physiological situation the native structure is "more probable than the others under the given environment", but, we still do not know the shape of the native structure.So to solve the protein folding problem, at least for the prediction of native structure from the knowledge of amino acid sequence, we have to know what ( ) ; , N G X U E and solve ( 9) or (10) with whatever mathematical method.

Force of Folding
In [6], the folding force is claimed as ( ) ; , N G −∇ X U E .In [6] it is also claimed that search the minimiser in GEL, such as in (9), is target-based, and identify the folding force is cause-based.In fact, target-based and cause-based are only artificial distinctions.We see that in solving (9), we do not have a target to subjectively approaching and if we use the fastest deciding method to find the minimiser, we are really cause-based, because we are explicitly using the force ( )

Understanding Denaturation and Refolding
If theoretically derived ( ) ; , G X U E is known for any specific environment E , for example, environments dif- fer only in temperature values,

( )
; , G X U E can explain the denaturation and protein refolding by changing environments.Changing the environment N E to, say D E , the native structure N X of U will no longer be a minimiser of ( ) ; , D G X U E in problem ( 9) or (10), therefore, N X is unstable in D E .Moreover, ( ) . By the same second law of thermodynamics, or thermodynamic principle, ( ) will force a minimisation of ( ) ; , D G X U E , resulting different minimisers other than N X .This is a theoretical explanation of denaturation.For certain types of proteins, when the environment D E changes back N E , or similar ones, refolding happens as the same procedures as in problem (9).
Furthermore, various thermodynamic functions, such as the entropy S , can be obtained by the family ( ) ; , G X U E .For example, ( ) ( )

Hydrophobic, Hydrophilic, Which Is the Folding Force?
There is a hot debate in [6] on which one is the main folding force, hydrophobic effect or hydrophilic force?Once we know ( ) ; , N G X U E , moving along the force direction ( ) ; , N G −∇ X U E , G will become smaller.
The qualitative character of i ω is that if the class i H is hydrophobic, then 0 i ω > , and if X such that ( ) ( ) and ( ) ( ) . This well explains that both hydrophobic effect and hydrophilic force playing their roles in reducing the Gibbs free energy.But these are only two terms in ( ) , other terms' values will eventually determine the sign of ( ) . And if troduced, its roles in protein structure prediction and in explaining folding process are discussed.The derivation of ( ) ; , N G X U E is applying quantum statistics to thermodynamic systems X  tailor made for a single con- formation X and its immediate environment, following Anfinsen's original single molecule orientation.

3 Ω⊂A S the area of a surface 3 STheorem 1
 and ( ) ⊂  .Now we can write the ana- lytic Gibbs free energy formula Let U be a monomeric globular protein with N atoms ( ) Thus reducing hydrophobic areas ( ) i A M X (hydrophobic effect) or enlarging hydrophilic areas ( ) j A M X (hydrophilic force?) will reduce the Gibbs free energy G .In terms of the force ( ) ; , N G −∇ X U E , the situation is much more complicated.Consider the case of a force pushing X to a new conformation ′