The pattern of co-existed posttranslational modifications-A case study

Posttranslational modifications are a class of important cellular activities in various biochemical processes including signalling transduction, gene/metabolite networks, and disease development. It has been found that multiple posttranslational modifications with the same or different modification residues can co-exist in the same protein and this co-occurrence is critical to signalling networks in cells. Although some biological studies have spotted this phenomenon, little bioinformatics study has been carried out for understanding its mechanism. Four data sets were downloaded from NCBI for the study. The joint probabilities of any two neighbouring posttranslational modification sites of different modification residues were analyzed. The Bayesian probabilistic network was derived for visualizing the relationship between a target modification and the contributing modifications as the predictive factors.


INTRODUCTION
Posttranslational modifications (PTMs) are a chemical process of modifying a protein's chemical or structural properties after the translation of the protein has been completed.Posttranslational modifications are closely related with signalling networks of molecules in cells.The modifications include attaching a chemical, chemical structural change of amino acids, or protein structural change.The modification will also alter the functions of a protein in both ways, adding functions or removing functions, for instance, phosphorylation and dephosphorylation, carboxylation and decarboxylation.Because of these changes, proteins will carry different signals for functioning in cells.Posttranslational modifications are then the focus of many signalling transduction network studies.For instance, properly folded and posttransla-tional modified endoplasmic reticulum is related with stress and will lead to different pathological states [1].Poly (ADP-ribosyl)ation is related with DNA repair and cell cycle checkpoint pathways as the unique signal for protein function modulations [2].Posttranslational modifications are the mediators for the transporters for multiple functions of human copper-transporting ATPases [3].In the study of various chronic diseases, it is found that protein 3-nitrotyrosine (nitration) plays an important role in pathological conditions [4].Together mutations and aberrant mRNA splicing, hyperphosphorylation will lead to a number of neurodegenerative disorders [5].S-Nitrosation has been recently found to have similar function as phosphorylation and acetylation because of its association with various pathological cell reactions in signalling networks [6].In cell-cycle control, differentiation, metabolism, stress response and programmed cell-death, the FOXO subgroup of forkhead transcription factors have been found being tightly controlled by phosphorylation, acetylation and ubiquitination [7].
In biological experiments, it has been found that the co-occurrence of posttranslational modifications is critical for many cellular functions in recent a few years.For instance, it has been found that the necessary condition for stable transcriptional activity of p53 is the cooperation of multiple posttranslational modifications such as phosphorylation, acetylation, and ubiquitination [8].In studying the complex pattern of posttranslational modifications and its impact on cellular processes, it has been found that lysine acetylation, arginine/lysine methylation and serine/threonine phosphorylation will work together cooperatively for regulating the high mobility group proteins [9].In the experiments with human cancer specimens, it has been found that the extent of acetylation, formylation and methylation is higher in cultured cells [10].It has also been found that proteins with multiple posttranslational modifications may make contribution to similar signalling functions [11].In studying DNA repair, apoptosis and senescence, it has been found that the interplay between multiple protein modifications, including phosphorylation, ubiquitylation, acetylation and sumoylation is critical for properly propagating DNA damage signals [12] and the interplay between methylation and acetylation has been found important for activating p53 by responding to DNA damage signals

SciRes Copyright © 2009
SciRes Copyright © 2009 JBiSE [13].It is even found that there is crosstalk between different posttranslational modifications [14].In glycogen syntheses kinase-3, it has been found that O-GlcNAcylation O-phosphate is interplaying for cellular regulation [15].The interplay has been also found in the steroid receptor coactivators [16].In the study of transcriptional programming, it is found that interplay between posttranslational modifications exists in H3 termini [17].In histone, it has been found that there are multiple arginine posttranslational modifications which are critical for some disease development.Also in histone, the interplay between sumoylation and either acetylation or ubiquitylation has been observed contributing to complex functions of proteins [18].A recent study has used laboratory method to identify co-occurrence posttranslational modifications [19].A computational method was proposed to predict the interplay between phosphorylation sites and O-GlcNAc sites based on peptides around modification residues [20,21].The analysis was based on the prediction results from various PTM prediction tools and was based on peptide information only.Moreover, the method only focused on the competition mechanism between phosphorylation sites and O-GlcNAc modifications at the same residues.The patterns of co-occurrence of posttranslational modifications are so far unclear or have not yet emerged through large scale studies.Bioinformatics study will help revealing those patterns and will benefit many desirable cellular engineering processes, i.e. disease control and prevention based on handling signalling pathways subjectively.This study is aimed to analyse the patterns of co-occurrence of multiple posttranslational modifications and visualizing their relationship through a probabilistic analysis.

APPROACH
The first thing we need to do is to scan all the sequences in a data set to find all neighbouring modifications.We use frequency as the joint probability to measure the quantita-tive property of co-occurrence of modifications first.Through analyzing the frequency of two modifications, the likelihood that a pair of modifications occurs can be quantified.
However, the frequency analysis only shows how likely two modifications can occur simultaneously in the same sequence.For instance, we may observe that the probability for hydroxyproline and hydroxylysine to occur simultaneously in the same sequence is 18.6%.But this does not indicate how likely hydroxylysine depends on hydroxyproline.In other words, if we have observed a hydroxyproline in a sequence, how likely can we find a hydroxylysine in the same sequence as the neighbouring modification?We first define the joint probability of two different modification residues as the frequency defined as Using the product theory in probability, we have Here reads out as the conditional probability for X to occur given that Y has happened.Based on the above calculation, we will have two conditional probabilities, either the probability of observing X if Y has been observed or the probability of observing Y if X has been observed.Based on the conditional probabilities and the marginal probabilities, we can use the Bayes rule to determine the posterior probabilities which are commonly used for decision-making.The Bayes rule is defined as Here we treat Y as a target modification, for instance a hyrdoxyproline residue.i X is the ith potential contributing modification for the target modification

DATA AND EXPERIMENTAL DESIGN
Two classes of posttranslational modifications are used for the study, i.e. hydroxylation and methylation.Both have two most common modification residues.The hydroxylation mainly functions at a lysine residue or a proline residue while methylation mainly functions at a lysine residue or an arginine residue.Both have ample experimentally verified data for the study.Four keywords, hydroxyproline, hydroxylysine, methyllysine and methylarginine were used to scan NCBI database to download sequences.All the identical sequences were removed from the study.All three types of phosphorylations were grouped together named as phosphorylation.All the amidation activities were also grouped into one type of modification.Various acetylation modifications are grouped together.Because there are only two poly (methylaminopropyl) lysine sites, they are treated as methyllysine.Two methy-hydroxylysine residues are treated separated as one methyllysine and one hydroxylysine.
Table 1 shows the statistics of these four data sets.There are 10, 17, 10, and 8 different modification residues in the hydroxylysine, hydroxyproline, methylarginine, and methyllysine data sets, respectively.
Here modification residue means a specific posttranslational modification activity at residues in proteins, for instance a hydroxyproline residue means a proline which can be hydroxylated and has been confirmed in experiments.The percentages of sites per sequence are 15.3, 7.1, 7.3, and 5.8 for the hydroxylysine, hydroxyproline, methylarginine, and methyllysine data sets, respectively.The hydroxyproline data set has the double number of modification residues compared with other three data sets.The details of multiple modifications are listed in Table 2.The abbreviations are seen in Table 3.
Table 4 shows the distribution of neighbouring PTMs of different modification residues.It can be seen that at least 25% (and up to 35%) of neighbouring PTMs are of different modification residues.

SciRes Copyright © 2009 JBiSE
Among these neighbouring PTMs of different modification residues, 33%, 68%, 78%, 84% have the distance less than 10 residues for the methylarginine, the methyllysine, the hydroxylysine, and the hydroxyproline data sets, respectively as seen in Table 5.Because of this, two PTMs of different modification residues may likely share similar structure (at least an overlapped local structure) for binding.This suggests that the cooperative activities of PTMs of different modification residues are critical to cellular signalling/functioning.
Based on the sequences, we produce a program to search for all the posttranslational modification residues in four data sets.The sites must have the notation as </site_type="modified">, </experiment="experimental evidence, …">, and </note_type=X>.Here X can be various types, for instance, 4-hydroxyproline, 3-hydroxyproline, 5-hydroxyproline, etc.For each of four posttranslational modifications, we find all the involved posttranslational modification sites.The sequences in different data sets are analyzed separately although there are some overlaps among them.

RESULTS
Table 6 shows the frequencies as the joint probabilities of nine types of modifications for the hydroxylysine data set.Proteolytic is removed as there is only one such site in the data set.The highest joint probability is 60.2% for two hydroxyproline residues to be neighbours.However, the joint probability for two hydroxylysine residues to be neighbours is only 6.81%.The co-occurrence probability for these two types of hydroxylation is 18.8%.These two probabilities indicate that for every hydroxylysine residue, the probability for it to have a hydroxyproline as the neighbour is three times higher than the probability for it to have the same hydroxylysine residue as the neighbour.The joint probabilities for a hydroxylysine to have an amidation, allysine, phosphorylation residue, bromotryptophan, hydroxyphenylalanine, and hydroxyarginine residue as the neighbour are 0.55%, 0.55%, 0.18%, 0.74%, and 0.18%, respectively.This means that except for the same type of modification,   HP HK AM AK PH MK BR HF HR HP 60.2 18.8 0 0 0 0 0.18 0 0 HK 18.8 6.81 0.55 0.55 0.18 0 0.18 0.74 0.18 AM 0 0.55 0 0 0 0 0 0 0 AK 0 0.55 0 0 0 0 0 0 0 PH 0 0.18 0 0 4.05 5.52 0 0 0 MK 0 0 0 0 5.52 1.47 0 0 0 BR 0.18 0.18 0 0 0 0 0 0 0.18 HF 0 0.74 0 0 0 0 0 0.37 0 HR 0 0.18 0 0 0 0 0.18 0 0 hydroxylysine has a high correlation with allysine, amidation, and hydroxyphenylalanine modifications.
Figure 1 visualises the probabilistic relationship among different types of modifications in the hydroxylysine data set using the posterior probabilities, where all the posterior probabilities less than 10% are omitted for simplicity.The network demonstrates that hydroxylysine only depends on hydroxyproline (24%).However, hydroxylysine has great impacts on five modification residues, allysine (100%), amidation (100%), hydroxyphenylalanine (67%), hydroxyarginine (50%), and bromotryptophan (33%).Phosphorylation and methyllysine modification residues are independent from the hydroxylysine block.They are mutually correlated to each other.
Table 7 shows the joint probabilities for the hydroxyproline data set.It can be seen that the probability for two hydroxyproline residues to be neighbours is 57.7%.The likelihood for a hydroxyproline to have a hydroxylysine as the neighbour is 8.46%.However, the likelihood for a hydroxyproline to have a hydroxyphenylalanine as the neighbour is 18.2%.This means that a hydroxyproline is more likely to have a hydroxyphenylalanine to co-occur rather than hydroxylysine.The other two important modifications for hydroxyproline are carboxylation and amidation.The probability for a hydroxyproline residue and an amidation residue to be neighbours is 4.06% and the probability for a hydroxyproline residue and a carboxylation residue to be neighbours is 1.91%.The other co-occurred modifications with joint probabilities larger than 0.2% are acetylation and bromotryptophan.
The probabilistic relationship among different modifications shown in Figure 2 is built for the hydroxyproline data set using the posterior probabilities.All the posterior probabilities less than 10% are not shown.In the probabilistic network, it can be seen that the most contributing modifications for hydroxyproline is hydroxyphenylalanine (20%).However, hydroxyproline has contributed to 11 other modification residues.For instance, the posterior probability P(HP|AC) is 100% meaning that whenever we have found an acetylation residue, it is certain that there is a hydroxyproline residue nearby.The posterior probability P(HP|HP) is 63% while P(HK|HP)=9%, P(AK|HP)=1%, P(AM|HP)=4%, P(AC|HP)=1%, P(CA|HP)=2%, P(BR|HP)=1%.Table 8 shows the frequencies of modifications in the methylarginine data set.The likelihood for two methylarginine residues to co-occur as neighbours is 40.3%.The interesting phenomenon is that the co-occurrence probability between methylarginine and methyllysine residues (as neighbours) is 0%.The contributing modifications to methylargine residues are phosphorylation (11.1%), acetylation (2.78%), methylhistidine (1.39%), and citrulline (1.39%).This is not as expected as it is thought that both two dominant methylation modifications should be highly correlated.
Shown in Figure 3 is the probabilistic relationship among the modifications derived from the methylarginine data set.The network shows that the most contributing modifications to methylarginine modification is phosphorylation (19%), i.e.P(PH|MR)=19%.For any observed methylarginine residue in a protein, the probability of observing a phosphorylation residue is 19%.Meanwhile, methylarginine residue can be an important pre-request for other modification residues.For instance, the probability of observing a methylarginine residue if a methylhistidine residue has been observed is 100% and the probability that a methylarginine residue is standing near an observed acetylation residue is 33%.100%.Table 9 shows the frequencies for the methyllysine data set.The probability for two methyllysine residues to be neighbours is 28.3%, which is not dominantly high.Interestingly, we have found the contributing modifications for methyllysine are acetylation (15.11%) and phosphorylation (10.61%).It is again difficult to find the evidence that two methylation modifications are highly correlated.
Figure 4 shows the probabilistic network as the relationship among the modifications in the methyllysine data set.Here, the most contributing modifications to methyllysine are phosphorylation (22%) and acetylation (31%).The methyllysine residue has a high correlation with methylarginine, methyhistine, and methylalanine.All have the posterior probabilities as 100%.

CONCLUSION
This paper has studied the co-occurrence pattern of two types of posttranslational modifications with four modification residues.The study aims to reveal how posttranslational modifications are correlated to each other, i.e. how one posttranslational modification contributes to the others.It has been found that the hydroxylysine and hydroxyproline residues are not the most mutually dependent modification residues, so are the methylarginine  and methyllysine residues.We have found that the hydroxyllysine residues depend on the hydroxyproline residues with a posterior probability 24% and the hydroxyproline residues are the unique major contributing modification residue for the hydroxylysine residues.However, we have found the hydroxyproline residues nearly do not depend on the hydroxylysine residues.Instead, the hydroxyphenylalanine residues are the contributing modification residue to the the hydroxyproline residues with a posterior probability 20%.Among the methylarginine residues and the methyllysine residues, we have found that the phosphorylation residues are the main player for both of these two modification residues.
In addition, the acetylation residues are needed for the methyllysine residues as well.Surprisingly, two different methylation residues also do not rely on each other.Although the study is limited to two modification classes with four modification residues, it is expected that this method can be generalized to a wide range of multiple posttranslational modification pattern discovery.

Figure 1 .
Figure 1.The probabilistic network as the relationship among modification residues in the hydroxylysine data set.The values represent the posterior probabilities.The arcs mean the directions.For instance, the arc from HK to AM with a number 100 means P(HK|AM)=100%.In other words, the probability of observing a hydroxylysine residue nearby an observed amidation residue is almost certain

Figure 2 .
Figure 2. The probabilistic network as the relationship among modifications in the hydroxyproline data set.The values represent the posterior probabilities

Figure 3 .
Figure 3.The probabilistic network as the relationship among modifications in the methylarginine data set.The values represent the posterior probabilities

Figure 4 .
Figure 4.The probabilistic network as the relationship among modifications in the methyllysine data set.The values represent the posterior probabilities X and Y are two different modification residues.We then define the marginal probability as the frequency for one modification residue to occur in a data set

Table 1 .
The statistics of four data sets

Table 2 .
The details of modifications in four data sets

Table 3 .
The abbreviations

Table 4 .
The distribution of neighbouring PTMs of different modification residues

Table 6 .
The joint probability as frequency of co-occurred modifications for the hydroxylysine data set

Table 7 .
The joint probability (larger than 0.2%) as frequency of co-occurred modifications for the hydroxyproline data set

Table 8 .
The joint probability as frequency of co-occurred modifications for the methylarginine data set

Table 9 .
The joint probability as frequency of co-occurred modifications for the methyllysine data set