Mutation Patterns in Lysostaphin

Lysostaphin is widely used in clinical settings against Staphylococcus aureus, but its mutants can abolish its killing activity. The difficulty in studies of mutations in lysostaphin is the shortage of data, which may need many decades to collect, although lysostaphin is so im-portant for clinical therapeutics and drug development. In order not to passively wait for the accumulation of new data, in this study 1) the 23,442 mutations in 1408 proteins from databank were used to determine whether the mutations in lysostaphin follow the general mutation trend obtained from the databank, 2) the amino-acid pair predictability was used to explore the underlined mechanism for lysostaphin mutations, and 3) the amino-acid distribution probability was used to associate the mutation with dysfunction of lysostaphin. The results show that the mutations in lysostaphin follow the general trend of mutations in proteins; the underlined mechanism for mutations in lysostaphin is explainable from a viewpoint of randomness, and a mutation with increased distribution probability would have a larger chance to dysfunction lysostaphin. This study provides useful information for future design of anti-S. aureus drug and enzyme engineering.


INTRODUCTION
Lysostaphin is widely used in clinical settings because it was found to have a specific lytic action against Staphylococcus aureus half a century ago [1]. Structurally and functionally, lysostaphin is a zinc metalloenzyme [2], which includes three enzymes, glycylglycine endopeptidase, endo-beta-N-acetylglucosaminidase and N-acteylmuramyl-L-alanine amidase. Glycylglycine endopeptidase specifically cleaves the glycine-glycine bonds, which is unique to the interpeptide cross-bridge of S. aureus cell wall [3].
With time going on, the interaction between lysostaphin and S. aureus leads to two outcomes, i.e.
Open Access J. Biomedical Science and Engineering mutations in S. aureus lead to drug-resistant [4] whereas mutations in lysostaphin lead to the dysfunction of lysostaphin. Indeed, previous studies have suggested that the site-directed mutants in lysostaphin can abolish its killing activity, i.e. the killing activity of glycylglycine endopeptidase against S. aureus [5].
These imply that the therapeutic effects of lysostaphin face serious challenges, because lysostaphin may someday lose its efficacy due to unpredictable mutations either in S. aureus or in itself. Nevertheless, these still suggest that efforts should be directed to study the mutations in both S. aureus and lysostaphin. Technically, studies on mutations either in S. aureus or in lysostaphin are not easy, because the structure and mechanism in S. aureus are very complicated, which make it difficult to figure out the real force and reason behind the mutations, whereas lysostaphin has few mutations although its biological structure and function are simple. However, it would be preferred to investigate the mutations in lysostaphin because its mutation pattern should have less unexplainable than that in S. aureus.
A question is whether the mutations in lysostaphin have some patterns, which could reveal the explainable mechanism for the general trend in lysostaphin dysfunctional mutations. Doubtless, this question is meaningful not only for the clinical therapeutics but also for drug developments. Unfortunately, not many mutations have been documented in literature for lysostaphin, especially, glycylglycine endopeptidase, over decades. Therefore, the researchers are facing to the problem regarding the mutations in lysostaphin: either we are passively waiting for hundreds of years to collect sufficient mutations, or we could conduct a study using these limited data, which is not fashionable nowadays because currently researchers are accustomed to using tens of thousands of data to draw a conclusion.
However, the lack of data should be weighed by the facts that not only such studies at worst can throw a clue to clinical therapeutics and drug development, but also other mutation data can be added. Therefore, we compare the general mutation patterns obtained from almost all the mutations in proteins documented in databank to lysostaphin mutations. If the handful mutations in lysostaphin could follow the general mutation patterns, then we would be in the position to address the question, what is the probability that a future mutation in lysostaphin would lead to its dysfunction?
On the other hand, the shortage of data suggests that we might not be able to use the standard techniques to investigate mutations because they generally require a large amount of data. In this study, we use the computational mutation approach [6][7][8][9][10] because it is more suitable for our purpose. Thus the aim of this study is designed to use the computational mutation approach to investigate the possible mutation pattern in lysostaphin.
Still, there are another 1408 proteins with 23,442 missense point mutations in UniProt [13], and these 1408 proteins are composed of 52 to 34,350 amino acids. Hopefully these 1408 proteins with 23,442 mutations would provide a general pattern to compare with mutations in lysostaphin.

Mutations in Terms of Predictable Proportion of Amino-Acid Pairs
Because there are 400 types of amino-acid pairs from combinations of 20 kinds of amino acids, we use the amino-acid pair predictability to classify the amino-acid pairs in a lysostaphin as predictable and unpredictable [6][7][8][9][10]. For example, lysostaphin P10547 has 51 glutamic acids (E), 38 glycines (G), and 48 valines (V). According to the permutation, the amino-acid pair GV should appear 4 times (38/493 × 51/492 × 492 = 3.70); actually there are four GVs in lysostaphin P10547, so the appearance of GV is predictable. Yet, the amino-acid pair VE should appear 5 times (48/493 × 51/492 × 492 = 4.97), however, it appears 30 times in lysostaphin P10547, so the appearance of VE is unpredictable. In this way, all amino-acid pairs in lysostaphin P10547 are classified, and its predictable proportion is 44.50%.
In this way, we can easily determine whether it occurs in predictable or unpredictable amino-acid pairs when a mutation occurs in a lysostaphin. Consequently we can estimate the change in predictable proportion of amino-acid pairs. All of the data can be calculated in the website [14].
In general, a point missense mutation results in two amino-acid pairs to mutate to another two amino-acid pairs. For instance, a mutation at position 210 in lysostaphin O33599 substitutes histidine (H) with alanine (A) and thus inactivates its own activity [12]. This mutation makes two amino-acid pairs AH and HY change to the pairs AA and AY, because the amino acid is alanine (A) at position 209 and tyrosine (Y) at position 211. Taking two original amino-acid pairs as an example, the actual and predicted frequencies were 3 and 2 for AH and 2 and 1 for HY before mutation (Row 4, Table 2), and were 2 and 2 for AH and 1 and 1 for HY after mutation (Row 17, Table 2). Taking two mutated amino-acid pairs into account, the actual and predicted frequencies were 3 and 3 for AA and 2 and 3 for AY before mutation (Row 30, Table 2), and were 4 and 4 for AA and 3 and 3 for AY after mutation (Row 43, Table 2). Thus, we can estimate their change in frequency to find the mutation effects on these amino-acid pairs.

Mutation in Terms of Amino-Acid Distribution Probability
To search for the probability that a mutation in lysostaphin in future will lead to its dysfunction, we use the amino-acid distribution probability [9,10], where the position of amino acid in a protein is viewed as the problem of the occupancy of subpopulations and partitions [15] with the following equation, A mutation at position 37 leads D to mutate to A, which is clinically dysfunctional, so we have 6 6! 6! 6 2! 3! 0! 1! 0! 0! 0! 0! 1! 1! 1! 3! 0! 720 720 1 0.1543 These two probabilities concern the original amino acid, similarly we can also compute two probabilities for the mutated amino acid, which are 0.0621 and 0.1544 before and after mutation, and the effect of mutation is (0.1543 -0.3213) + (0.1544 -0.0621) = 0.0609, suggesting that this mutation increased the amino-acid distribution probability. In this way, we quantify the mutation numerically with respect to original and mutated amino acids [16].

Statistics
The Chi-square test was used to compare the number of amino-acid pairs, and the Mann-Whitney U-test was used to compare the difference between predicted and actual frequencies.

RESULTS
As there are not many mutations in lysostaphin, we necessarily compare its mutations against the general mutation patterns of all proteins with mutations in databank. If the mutation pattern in lysostaphin follows the general pattern of mutations in all proteins with mutations in databank, then we could suggest that the mutations in lysostaphin would have a similar mechanism as other proteins and the studies on mutations in other proteins would shed light on mutations in lysostaphin. If this is not the case, we should consider that the mutations in lysostaphin have different mechanisms from other proteins, and then the research would follow different directions.
Mutations in different proteins are so different that we use the ratio of mutation position versus protein length as an indicator to compare mutations in lysostaphin. Figure 1 and Figure 2 show such comparison. As shown by arrows, there is some similarity between general mutation patterns and mutations in lysostaphin. This means that a likely mutation would be positioned around six tenths of lysostaphin lengths. At least, these positions could be defined as so-called "hotspots" for mutations [17]. Other hotspots could be positions around ratio of 0.56 to 0.57, which also have relatively more mutations in Figure 1.  However, the comparison between Figure 1 and Figure 2 could not explain the documented mutation positions beyond the ratio of 0.7 (arrows in Figure 1 and Figure 2) although the general mutation patterns also have less intensive mutations around this region. This difference is very suggestive because lysostaphin might have mutated so many times in this region, so we would expect the mutations in future in other regions. Alternatively, the mutations in other regions of lysostaphin might have led to the complete dysfunction of lysostaphin at very early stage, and then lysostaphin has been cleaved by proteases so those mutations were not documented.
Following these general trends, we can look at what the computational mutation approach tells us. Table 1 shows the changes in predictable proportions of amino-acid pairs before and after mutation. As can be seen, 10 out of 12 mutations increase the predictable proportion in mutant lysostaphin, and 2 mutations decrease the predictable proportion, indicating that most mutant lysostaphin has a larger predictable proportion of amino-acid pairs (Chi-square test, P < 0.05).
Hereafter, we can analyze the changes of amino-acid pairs to find what effects the mutations have. Table 2 details the actual frequency, predicted frequency and their difference in original and mutated amino-acid pairs before and after mutation. The unpredictable pair characterizes one of targeted amino-acid pairs with the actual frequency larger than its predicted one so that the frequency differences are negative in the original pairs before mutation (Rows 3-14, Table 2). After mutation, the original pairs have smaller frequency difference and become more predictable (Rows 16-27, Table 2).
For the mutated amino-acid pairs, 10 of 12 mutations were involved at least one unpredictable pair whose actual frequency is smaller than its predicted one so that the frequency differences are more likely to be positive before mutation (Rows 29 -40, Table 2). After mutation in general, the mutated pairs have smaller frequency difference and become more predictable (Rows 42 -53, Table 2). Figure 3 illustrates the mutation effects on the targeted amino-acid pairs. As can be seen, the median of frequency difference is -2 for original pairs and 1.5 for mutated ones before mutation. After mutation, the median of frequency difference is zero for both original and mutated pairs (the Mann-Whitney U-test, P < 0.01).
In order to find the probability that a mutation leads to dysfunction of lysostaphin, we use the amino-acid distribution probability to quantify the documented mutations in lysostaphin, of which lysostaphin P10547 documented 7 mutations and lysostaphin O33599 documented 5 mutations. Table 3 and Table 4 show that most mutations lead the increase in difference before and after mutation in terms of amino-acid distribution probability. This observation is similar to the findings using amino-acid distribution probability in other proteins [9,10].
Accordingly, we can conduct the cross-impact analysis to get more insight into mutations in lysostaphin [18]. Figure 4 shows the scheme based on cross-impact analysis for events with respect to mutations and dysfunction. At the level of amino-acid distribution probability, P(2) and ( ) 2 P are the decreased and increased probabilities induced by mutations, and 2 and 10 mutations result in the distribution probability decreased and increased, respectively. At the level of classification: 1) ( ) is the impact probability (conditional probability) that the mutation activates the glycylglycine endopeptidase of lysostaphin occurs under the condition of increased distribution probability, and one mutation has such an effect. 2) ( ) 1 2 P is the impact probability that the glycylglycine endopeptidase of lysostaphin dysfunctions under the condition of increased distribution probability, and 9 mutations work in such a manner. 3) P(1|2) is the impact probability that the mutation activates the glycylglycine endopeptidase of lysostaphin under the condition of decreased distribution probability, and no mutation plays such a role. 4) ( ) 1 2 P is the impact probability that the glycylglycine endopeptidase of lysostaphin dysfunctions under the condition of decreased distribution probability, and 2 mutations fall into this category. At the level of combined events, we see the combined results with their frequency. Table 5 lists the computed probabilities in scheme in Figure 4, from which several interesting points can be found. 1) As ( ) 2 P is larger than P(2), a mutation has 10/12 chance of increasing the distribution probability in mutant lysostaphin. 2) As ( ) 1 2 P is far much smaller than ( ) , a mutation that increases the distribution probability has little chance to activate lysostaphin. 3) As P(1|2) is remarkably smaller than ( ) 1 2 P , a mutation that decreases the distribution probability has no chance to activate lysostaphin. Figure 3. Sum of differences between predicted frequency and actual frequency Σ(PF-AF) in original and mutated pairs before and after mutation. The data are presented as median with interquatile. * and ** indicate the statistical difference at P < 0.05 and P < 0.01 level compared with the mutated pairs before mutation. # indicates the statistical difference at P < 0.01 level compared with the original pairs after mutation.

DISCUSSION
In this study, we attempted to gain the insight of mutations in lysostaphin in three ways. First, we compare the mutations in lysostaphin against almost all the available mutations in proteins from databank to determine whether the mutations in lysostaphin follow the general trend in mutations. Second, we use the amino-acid pair predictability to explore the possible underlined mechanism for the mutations in lysostaphin. Third, we use the amino-acid distribution probability to determine the probability of dysfunction of lysostaphin when a mutation occurs.
Here, we need more elaborations on the underlined mechanism for mutations in lysostaphin. The results reveal the mutation pattern in lysostaphin based on the current data: the mutations are more likely to target the unpredictable amino-acid pairs, and reduce the difference between actual and predicted frequencies, leading to the increase in the predictable proportion of amino-acid pairs in mutant lysostaphin. This is very meaningful because it is consistent with the mutation patterns obtained from other proteins by means of computational mutation approaches [6][7][8][9][10].
The hotspot sites [17,[19][20][21], and function disruption [5,22,23] can explain why some amino acids are mutated more frequently than others. From the random viewpoint, the predictable amino-acid pairs suggest that their construction needs the least time and energy with the maximal probability of occurrence. Natural parsimony demands enzyme construction with least energy and time to adapt for the evolution, which would lead to mutations [6][7][8][9][10]. The lysostaphin mutations do diminish the difference between predicted and actual frequencies, which generally lead to the dysfunction of lysostaphin.
On the other hand, the unpredictable pairs suggest that nature spends more time and energy to deliberately construct them so the functional amino-acid pairs should be deliberately evolved, and maintained the difference between actual frequency and predicted one. Interestingly there is a mutation from lysostaphin O33599, which changes the asparagine at position 117 to alanine, and is the only one activating the enzyme [12]. Our study shows that this mutation enlarges the frequency difference of mutated pairs to -4 after mutation (Row 42, Table 2) from -2 before mutation (Row 29, Table 2), and decreases the predictable proportion of amino-acid pairs in the mutant (Row 2, Table 1).
In conclusion, the significance of this study is that we get three pieces of information regarding the mutations in lysostaphin: 1) the mutations in lysostaphin follow the general trend of mutations in proteins, 2) the underlined mechanism for mutations in lysostaphin can be explainable from the viewpoint of randomness, and 3) a mutation with increased distribution probability would have a larger chance to dysfunction lysostaphin. These results provide useful information for future design of anti-S. aureus drug and enzyme engineering.