_{1}

^{*}

This article is concerned with the so-called Levinthal’s paradox. It will be argued that many have sought a “solution” to Levinthal’s paradox, where in fact, the “solution” already appeared in Levinthal’s original articles. Most of the subsequent suggested “solutions” were inadequate solutions to a non-paradox. It is shown that the discovery of strong hydrophilic forces not only dismisses the Levintal paradox, but also provides a solution to the general problem of protein folding. A simple model based on the Markov process is presented to demonstrate how a strong biased-force can dramatically reduce the number of steps required to reach the stable native 3-D structure of the protein.

Ever since it was realized that the process of denaturation of proteins can be reversed without any auxiliary agent, the protein folding problem became one of the major unsolved problems in molecular biology [1-10]. ^{}

In a recent editorial of Science, the editors listed 125 unsolved—“What Don’t We Know?” questions in science [

“Can we predict how protein will fold? Out of a near infinitude of possible ways to fold, a protein picks one in just tens of microseconds. The same task takes 30 years of computer time.”

There are essentially two problems associated with the process of protein folding. The first is concerned with the questions of how and why protein folds to its native 3-D structure in a very short time. The second is concerned with the factors that confer stability to the native structure of the protein.

These questions had presented formidable challenges to chemists, biochemists and physicists. In this paper we focus only on the first question, the one referred to in the quotation from Science editorial, which is also known as the Levinthal paradox [^{}

In the author’s opinion, the main hindrance to finding a solution to the protein folding problem has been the adherence to the hydrophobic (HØO) dogma, which states that various HØO effects (both solvation and interaction) are the “dominant forces” in protein folding [1,13]. The origin of this idea is contained in Kauzmann’s suggestion that: “The hydrophobic bond is probably one of the more important factors involved in stabilizing the folded configuration…” [^{}

Note that Kauzmann’s hypothesis does not say anything about the role of the hydrophobic effect on the dynamics of the process of protein folding. Yet surprisingly the literature abounds with many claims that the dominant forces in protein folding are the HØO effects.

An exhaustive analysis of all the solvent induced effects on protein folding reveals that the hydrophilic (HØI) effects are much more important than the corresponding HØO effects [8,9]. The discovery of the strong HØI effects—both interaction and forces—has removed the crux out of the mystery of the protein folding problem (as well as other related problems such as self assembly and molecular recognition).

In this article we discuss the role of the HØI forces in answering the question of why protein folds along a narrow range of pathways and in a relatively short period of time.

We start in the Section 2 with a few quotations from Levinthal who has most eloquently formulated the problem of protein folding [

In Section 3, we discuss some attempts to “solve” the so-called Levinthal’s paradox. In Sections 4 and 5, we suggest a way of implementing the HØI forces to achieving an answer to the protein folding problem. Some concluding remarks are presented in Section 6.

We begin with a few quotations from Levinthal’s articles [

a) “Let us ask ourselves how proteins fold to give such a unique structure. By going to a state of lowest free energy? Most people would say yes and indeed, this is a very logical assumption. On the other hand, let us consider the possibility that it isn’t so.”

b) “How accurately must we know the bond angles to be able to estimate these energies? Even if we knew these angles to better than a tenth of a radian, there would be 10^{300} possible configurations in our theoretical protein. In nature, proteins apparently do not sample all of these possible configurations since they fold in a few seconds, and even postulating a minimum time for going from one conformation to another, the proteins would have time to try the order of 10^{8} different conformations at most before reaching the final state.”

c) “We feel that protein folding is speeded and guided by the rapid formation of local interactions which then determine the further folding of the peptide. This suggests local amino acid sequences which form stable interactions and serve as nucleation points in the folding process.”

d) Then, is the final configuration necessarily the one of the lowest free energy? We do not feel that it has to be. It obviously must be a metastable state which is in a sufficiently deep energy well to survive possible perturbations in a biological system. If it is the lowest energy state, we feel it must be the result of biological evolution, i.e. the first deep metastable trough reached during evolution happened to be the lowest energy state. You may then ask the question, “Is it a unique folding necessary for any random 150-amino acid sequence?” and I would answer, “Probably not.”

The propositions quoted above are cast in a form which is reminiscent of a mathematical proof by contradiction; let us assume that X is true, reach an absurd result, then conclude that our assumption cannot be true. In fact, we have here all the elements of such a proof.

Statement a) raises two questions: 1) How does a protein fold to give a unique structure? and 2) Is this unique structure the state of lowest free energy? We shall refer to the first question as the Levinthal question. The second question, as well as its answer will not be discussed here. It is related to Anfinsen’s thermodynamic hypothesis, and it is discussed elsewhere [^{}

Statement b) essentially concludes that if one assumes a random sampling of the configuration space of the protein, then one arrives at an absurd result. Statements c) and d) suggest possible answers to the two questions raised in a).

Clearly, there exists no paradox in obtaining an absurd result based on an unrealistic assumption. Levinthal immediately recognized that the absurd result he reached follows from the wrong assumption of a random search over the immense configurational space. Levinthal did not see that absurd result as a paradox, as so many others did. He immediately reached the (almost) correct solution, as stated in quotation c) above. Namely, that there must be preferential pathways of folding, “guided by rapid formation of local interactions.” Although Levinthal did not specify what these “guiding interactions” are, his solution to the absurd result (based on unrealistic assumption) is almost correct. Instead of “guiding interacttions,” one should use the term “guiding forces.” Though these forces are derived from the interactions, it is the magnitude of the force acting on the groups of the protein that determines the speed of the folding process. The main question left unanswered by Levinthal is: What are these strong forces that guide the protein to its native structure in a relatively short time? We now know that these forces originate from the water, more specifically the solvent-induced forces exerted on the hydrophilic groups along the backbone and the side chains of the protein [8,9]. ^{}

During the past 40 years many have sought for a solution to the (non-existent) Levinthal paradox.

Perhaps the most serious and much acclaimed attempt to “solve” the paradox was published by Zwanzig et al. [

“The main point of this paper is to show by mathematical analysis of a simple model that Levinthal’s paradox becomes irrelevant to protein folding when some of the interactions between amino acids are taken into account.”

This is exactly the answer given by Levinthal himself namely, that the interactions between different parts of the protein can guide the folding process. As we have pointed out above, the important guiding factors are the forces rather than the interactions. Zwanzig et al. do not offer any answer to the question regarding these forces, nor do they specify which are “some of the interactions between amino acids”. Furthermore, the model used by Zwanzig et al. is not a realistic one, and might even be misleading.

Zwanzig et al. drew from Dawkins’ brilliant ideas of explaining the mechanism of evolution [15-17]. Briefly, the protein is viewed as a sequence of N bonds, and the “connecting bond between two neighboring amino acids can be characterized as “correct” or “incorrect” (Correct means native in biology). Then they assume some rate constant () for the transition “correct” → “incorrect”, and another rate constant () for the transition “incurrect → correct”. Assuming further that the ratio () is small, they calculated the mean first-passage time to reach the fully “correct” conformation.

It should be noted that the metaphor used by Dawkins, is barely suitable for explaining evolution to the layperson [15,17]. The mechanism arriving at the “correct” target, as proposed by Dawkins demonstrates the possibility of occurrence of an event which is perceived to be highly improbable. As such, Dawkins model achieves its goal of removing the mystery out of the evolutionary process. However, even in evolution, there exist no “correct” or “incorrect” results. In fact, Dawkins himself recognized that his explanation is not relevant to the actual process of evolution [9,15]. Evolution does not pose any goals or targets to reach. Nevertheless, one can simply define a “correct” outcome as one which has some evolutionary advantage. This is not the case for the protein folding process. Therefore, Dawkin’s metaphor is not adequate for the process of protein folding. The main objection to this model is that one cannot justify the preferential transition from “an incorrect bond” to “a correct bond” at each stage of the protein folding process.

In evolution, a transition from “incorrect” to “correct” is biased according to some selection criterion, i.e. the “correct” result has some advantages, and therefore that result survives. There exists no analog of the selection criterion in the process of protein folding.

Furthermore, Zwanzig et al. do not provide a plausible reason for the particular assignment of the values of the rate constants and in terms of either molecular interactions or forces. Therefore, the model used by Zwanzig et al., as well as the specific solution of the model is not relevant to the protein folding problem. ^{}

There are many statements in which evolution theory is invoked in connection with the problem of protein folding. In a recent article Wolynes writes: [

“Evolution solved the protein-folding problem. A major goal of bio-molecular science has been to understand how this was done.”

Of course, evolution does not solve any problem, nor was the protein folding problem posed to Nature. Evolution only evolves and a product which has some evolutionary advantages survives. It is clear that Wolynes means “solved” in his first sentence only in a figurative sense, in the same sense when people say today that some bacteria “developed” resistance to some drug. Of course, bacteria do not “develop” anything. In a given population there are many mutants of the same bacteria, some of which are resistant to a specific drug. When that drug is administered, only those mutants that are resistant to the drug would survive. To an outside observer, it looks as if the population as a whole has “developed” a resistance to the drug.

Thus, it is acceptable to use the word “develop” in the sense that this is how it seems to an observer who is not aware of the existence of resistant mutants in the original population. However, it is meaningless to try to “learn” from the bacteria (or from Evolution) how they “developed” the resistance to the drug.

Similarly, during evolution proteins, or rather polypeptides were synthesized. Some folded, while others did not. Of those that folded some reached a stable 3-D structure, some did not. Of those that reached a stable structure, some had a special advantage, while others did not, and so on.

Looking at the final outcome of a functional protein one can say figuratively that Nature or Evolution has “solved” the problem of folding a polypeptide into some useful 3-D structure. This is acceptable only if we understand that in the “population” of all the peptides which were synthesized during evolution, some have folded into a useful 3D-structure. Not because this structure was the target of Evolution, and not because Evolution had faced the problem of how to fold a specific protein. However, one cannot attempt an understanding of how Evolution has “solved” the protein folding problem, simply because Evolution did not solve any problem. Any attempt to “learn” from evolution must therefore lead to a dead-end.

The more realistic mechanism for folding and the one alluded to by Levinthal, is the one based on the forces acting on each group at any given configuration of the protein. In this view there is no “correct” or “incorrect” bond, or a configuration. There is also no need to involve a folding “code”, or a “target” to be reached. At each stage of the process of folding there are many possible transitions, some are more probable and some less probable, or even improbable. This view leads to a range of pathways, which we may refer to as the preferential folding pathways, along which the protein folds with high probability, and with negligible probability along all other pathways. In other words, this view effectively reduces the immense number of pathways to a narrow “corridor” of protein folding pathways, within which there is some degree of randomness. However, randomness of this kind does not allow the protein to wander at any direction as the “drunken golfer,” seeking a single hole in a flat featureless landscape. Instead, the strong hydrophilic (HØI) forces, force and guide the protein to fold along a narrow range of pathways.

A suitable theoretical framework which is sufficiently general as well as realistic is to view the process of protein folding as a Markov chain [

, (1)

i.e. the transition probabilities are non-negative, and the probability to move from i to any other state j (including the state j = i) is one. The magnitude of the transition probabilities are determined by the forces acting on the protein, which in turn, are determined by the gradients in the Gibbs energy landscape. A reader of an earlier version of this manuscript commented that the “force” is derived from the energy landscape, and that the entropy contribution was neglected. This is not so. The “force” is derived from the potential of mean force, and includes contribution from both energy and entropy.

For simplicity we can choose the state n to be an approximate absorbing state, i.e. once the system reaches that state, it stays there for a sufficiently long time, so that it can fulfill its biological function and therefore it can be considered to be there “forever”. The folded state denoted here by the state n, is characterized by the transition probabilities

within this Markov-chain view of the protein folding process, we can characterize two extreme cases as follows:

The random search of the configurational space is equivalent to the assumption that at each state i there is equal probability to move to each of the states j which are accessible from state i, i.e.

for each i and for each j accessible from i

The other extreme case is that from each i the protein can move to a single state with high probability. Let us denote this state by i + 1, i.e. at each step we move from i to i + 1 until we reach, hence

The more realistic view is that at each state i, there is a group of states which are accessible from i, but have a distribution of probabilities. For instance starting from, the system can reach states. The transition probabilities to these states, is such that some states are reached with relatively higher probability. These three probabilities are depicted in

It is intuitively clear that if such preferential transition probabilities exist for each state i, then there will also be preferential pathways to go from state 1 to state n. These will not be the random searches in the configurational space, nor a single path implied by the second extreme model. Instead, the transitions will have some preference to go along a specific path, with some random deviations from the preferable paths. In such a Markov chain one can also compute the average number of steps to reach the absorbing state [

In this section we describe a simple process. Lest it will be misunderstood we emphasize that this is not a model for protein folding. It is raised here to demonstrate that a cause-biased process can drastically reduce the number of steps required to reach some stable state. This is to contrast the target-biased process.

We start with a small section of the protein and focus on one angle of internal rotation,

In the case of ethane, or hexamethyl ethane the internal potential energy of rotation would look like the one in

The details of the potential shape are not important here. What is important is that there might be some regions that are not accessible and within that region there might be minima that are even lower than the minima in the accessible region. For instance, suppose that every time we synthesize the molecule depicted on the lhs of

We now focus on the accessible region of angles. This region may be divided into two sub-regions. The first is referred to as transient (TR) states, includes all the points for which U(Ø) has a steep slope, say 120˚ - 160˚ and 250˚ - 350˚. The second, referred to as absorbing state, denoted AB, includes the points in the region 160˚ - 250˚, where the potential function is almost flat.

Clearly, whenever the configuration of the molecule is in the region TR there will be a strong force (or average

force, in case that U(Ø) is the Gibbs energy landscape) towards region AB. On the other hand, in the region AB where U(Ø) is almost flat there will be no strong forces acting on the groups of the molecule that will cause the molecules to exit this region. For simplicity, suppose we always start with a transient configuration on the left hand side of AB. We now divide the region into a finite number of intervals, and without loss of generality we can enumerate all the configurations in TR by the numbers and refer to these configurations as transient states. The configurations in the region AB will be enumerated, and will be referred to as absorbing states. It is clear that starting with any point in the region TR, there will be preferential force to move that point, with high probability towards the region AB. On the other hand, once we arrive at the region AB there will be very small probability to get out from this region.

Again, without loss of generality we can collect all configurations in the region AB into one state denoted by the index n. Translating the forces into transition prob-

abilities, we write

The choice of the transition probabilities for was made for convenience of notation. This has no effect on the results when we consider very large number of configurations, say or more.

Clearly, if we chose then starting from state we shall arrive at state n by exactly steps. On the other hand, if we chose, the point will move either towards the left, towards the right or stay put with equal probability. As expected the number of steps required to arrive at AB will be much larger.

A typical transition probability matrix for and is

with the transition probability matrix (6) we can calculate the average number of steps required to reach the state n, starting with any initial state. To do this we write the matrix in the form.

where is a matrix, is a row vector of zeros in a column vector, including all the transitions for, and the element .

According to a theorem in Markov chain theory [

where is a unit matrix. The required average quantities, denoted by are obtained from the sum of the row elements of the matrix, i.e.

The calculation of the vector was done for the transition probabilities as given in (5).

. Note that the spacing between the lines increases dramatically when p approaches 1/3.

Perhaps, the most dramatic results of this model is shown in ^{6}, and this number drops dramatically between and , then it stays relatively low around 2000, as expected for the extreme case of and.

This is an important result. If we have a relatively

strong force exerted on each group of the protein, at each stage of the folding process, then the average number of steps to reach the region AB dramatically decreases. This result in effect dismissed the so-called Levinthal paradox. It should be noted that this demonstration differs in a fundamental way from Zwanzig et al. approach. The transitions in our model are caused-biased, whereas in Zwanzig et al. they are target-biased. The cause, in our model are the forces, and the forces are obtained from the gradients in the Gibbs energy landscape. As such we do not claim anything whether these forces originate from the energy or the entropy. These are the total forces; both direct and solvent-induced forces acting on all the groups along the protein.

In real proteins we have many more degrees of freedom, and strong forces are exerted on many groups of the protein simultaneously. Also from each state i, there are many states which are accessible, not only two as in the example above, and the native state is not a single state, and certainly not an absolute absorbing state.

However, the extension from the simple model treated above to real protein should not pose any new difficulty. One can imagine that at each stage of the folding process, there are strong solvent-induced forces exerted on the various groups along the protein. These forces will force the protein to fold along a narrow range of pathways, not towards a pre-conceived target, and not towards the “correct” configuration for each bond. The overall process would be a speedy one, i.e. reaching the final relatively stable structure in a relatively short time. This procedure, in effect, answers Levinthal’s original questions quoted in sections II, and at the same time removes the crux of the difficulty from the general protein folding problem.

In the previous section we studied a simple stochastic process involving a change in one parameter, the angle of rotation Ø. In real proteins, say M amino acids, we have at least 2 M internal rotational degrees of freedom. At each configuration of the protein there are forces exerted on each of the groups. These forces are derived from the potential of mean force, which is essentially the Gibbs energy landscape of the protein. It is almost universally

recognized that the solvent-induced part of these forces are important. The main “unknown” in the protein folding process is not how and why the protein reaches its final 3-D structure in a short time, but what are the strong forces acting on the various groups of the protein, which force the protein to change its configuration within a narrow range of pathways. In my view once we have discovered the strong HØI forces exerted on the groups, we have not only dismissed the so-called Levinthal paradox, but we have also answered the question of how and why protein folds and reaches the final 3-D structure in a relatively short time.

Thus, the HØI forces explain the folding process and the HØI interactions also provide explanation for the stability of the protein. As we have discussed recently [

This is as far as one can get in answering the general principle of protein folding. The question that remains is how to implement the hydrophilic (HØI) forces in studying specific proteins. In this regard, there is no need to study the entire Gibbs energy landscape. We have seen in

In 1992 [

The rationale underlying this sequential reduction of the dimensionality of the space in which the protein “moves” is the following:

Suppose we start with the fully extended unfolded conformation. Initially, we would expect that the motion of the protein will be random until two or more hydrophilic groups are brought to such a distance that they exert strong hydrophilic force on each other. This force reflects the existence of a steep gradient in the original multi-dimensional landscape. Once the two hydrophilic groups are brought to a short distance such that they can form direct hydrogen bonds (as in an α-helix or β-sheet), some of the rotational angles ψ, Ø will be “locked” for a short time while the random motion about all other angles continues, but now in a lower dimensional space. Of course several direct HBs can occur simultaneously resulting in a further reduction in the dimensionality of the space in which the conformation of the protein moves. Some experimental evidence for the occurrence of waterbridges connecting hydrophilic groups of the backbone of proteins was reported [24,25].

As pointed out correctly by Dill and Chan [^{}

Unfortunately, the search for models of folding having a global minimum Gibbs energy conformation continues. In recent publications, one can find claims that the funnel model solves the Levinthal’s paradox [

A reader of an earlier version of the manuscript claimed that the distinction between the target-biased and the cause-biased motion of the protein is only semantic. I believe that the difference between the two approaches is profound and critical to the solution of the protein folding problem.

The motion of the protein is not “guided” or “speeded” towards a target. This is true in evolution theory and a fortiori true in protein folding. In the model described in this article, the motion is a result of the forces acting on the groups of the protein. The stronger the force is, the speedier the motion. Once the protein reaches a stable state, such that it stays there long enough to function, it will be in a local minimum in the Gibbs energy landscape. This minimum has nothing to do with the Second Law of thermodynamics, as so many have erroneously concluded [9,21].