Systematic Analysis of Post-Translational Modifications for Increased Longevity of Biotherapeutic Proteins ()
1. Introduction
Protein-based therapeutics is a rapidly growing field of study that is gaining unprecedented clinical traction worldwide. Currently, over 100 types of therapeutic proteins are approved for use in the European Union and the United States, and the industry surpassed $100 billion in sales in 2010 [1]. These figures were established just 40 years after the first recombinant protein therapeutic, human insulin, was produced in the 1970s [1]. Therapeutic proteins can be used to treat enzymatic deficiencies, modulate signaling pathways, or assist in delivering other drugs. It gives researchers great flexibility with drug design and allows for direct augmentation of protein levels, resulting in faster, more systemic effects than traditional pharmaceuticals [1]. However, there is still great room for growth. Currently, protein-based therapeutic agents (PPTs) struggle from short half-lives that impair their ability to reach their target tissue and cause lasting effects. Most PPTs have a propensity to aggregate irreversibly during storage, transportation, or injection. Aggregated proteins often change conformations, reducing or modifying their biological activity and creating greater risks of harmful immune responses [2]. Proteins and aggregates are also often rapidly degraded or cleared from the circulatory system through the ubiquitin-proteasome system or glomerular filtration in the kidney [3]. These factors result in low PPT half-life in the circulation and, subsequently, a shorter duration of action. Therefore, there is a clear inverse relationship between aggregation and protein half-life. Consequently, protein-based therapeutic treatments usually require increased doses than their traditional counterparts, which can lead to blood toxicity and immunogenicity concerns, and generally have a lower effectiveness [3]. Our goal is to increase the half-life and longevity of these proteins by decreasing their propensity to aggregate with strategically chosen PTMs to make PPTs a more viable, effective option for drug delivery.
Currently, several different strategies have been explored to increase the success rate of PPTs. For example, post-translational modifications (PTMs) of therapeutic proteins, including PEGylation (covalently attached ethylene glycol to decrease renal clearance) and hyperglycosylation (to mimic endogenous proteins and bypass the immune system), have been developed [4] [5]. A diagram demonstrating PTMs in proteins is shown in Figure 1.
There are also PLGA microsphere and lipid delivery strategies that envelop the protein to protect them and increase effectiveness. In addition, new research is being conducted on biodegradable polymers and polypeptides to be used as an alternative to PEGylation, but these require more research to be fully understood and put into use [6]. Similar to PTMs, one study determined the effects of adding a lipidated amino acid group onto the protein as a fatty acid chain. The research determined that in vivo, attaching fatty acids to proteins increased their binding to human serum albumin. This was confirmed in mice when GLP1(HepoK), the protein with the fatty acid, showed stronger binding to human serum albumin
Figure 1. Diagram demonstrating post-translational modifications (PTMs) in proteins and their change in structure & function [5].
than GLP1(WT), the protein without, without impairing the stimulation of the GLP1 receptor in cells, showing that the original function of the protein was not modified [7]. Another variable strategy that has been recently studied is taking advantage of the FcRn-Mediated Recycling Mechanism to increase longevity in contrast to actually modifying the protein itself. Fc fusion proteins have successfully extended the half-lives of therapeutic proteins by utilizing the bivalency of the molecule to improve bioactivity and prolong half-life, and overall utilizes the natural recycling pathway of the FcRn mechanism. In this study, genetic fusion was also utilized to modify the extracellular region of specific proteins, influencing binding with FcRn and increasing longevity as a result [8].
Among the strategies above, PTMs have been the most common and widespread method of furthering PPT efficiency and effectiveness. Specifically, PTMs refer to the addition of chemical groups or small molecules to the amino acids of proteins to modify their structure, function, or some other property [9]. Commonly used PTMs are not limited to just PEGylation and glycosylation; they also include methylation, phosphorylation, acetylation, carboxylation, and others [9]. PTMs happen in the body through enzymatic cleavage and attachments, and similar enzymes are also leveraged in large-scale modification in the protein therapeutics industry.
Although there is significant research on the abilities of certain PTMs, such as phosphorylation, acetylation, and methylation to affect protein behavior in the bloodstream, the results cannot be generalized to many different proteins. For example, phosphorylation at the S96 residue of PolyQ androgen receptor proteins by CDK2 has been found to promote aggregation and subsequent degradation, while phosphorylation of polyQ huntingtin proteins by Akt has been found to do the opposite [10]. Similarly, methylation has been proven in several studies to both activate and inactivate degradation-mediating regions called degrons in different proteins [10]. Clearly there is wide variability and ambiguity in the effects of PTMs on proteins as each has its own unique structure and function; some modifications or delivery strategies listed above are beneficial for some proteins but not others. Regarding this, our study explores the effects of three common PTMs on three common PPTs each, as explained in further detail in the next section. In addition, research has been done on specific PTMs and methods of addition, but not specifically on the effects on aggregation and longevity in varying PPTs, which our study seeks to measure. How much does each modification affect the propensity of each protein to aggregate during storage or injection? How well is biological function preserved with these modifications? Does each modification confer a net advantage to a specific PPT protein? Answers to these questions can lead to individualized information on aggregation, functional preservation, and longevity for different therapeutic proteins and can be used to analyze trends between PTM methods. They can also prevent life-threatening side-effects of in vivo immune responses to aggregation. We hypothesized that PTMs will cause a change in the longevity of the protein by affecting their affinity to aggregate, and that PTMs will also affect biological function and intermolecular interactions by altering their structure.
Our research will be relevant for pharmaceutical companies developing new protein-based drugs and therapies, and for consumers seeking safe treatments. It can provide comprehensive recommendations and insights into optimal combinations of PPTs and PTMs to make currently existing drugs more effective and lower the number of doses that patients need to take in a period of time. This reduces the overall cost of biotherapeutic proteins and also brings experimental proteins that were previously not used due to low half-life or high aggregation to the market, making them available for pharmaceutical applications while simultaneously putting consumers and healthcare providers at ease.
2. Methodology
2.1. Featured Proteins
2.1.1. Insulin & Insulin Receptor
Insulin is a peptide hormone that stimulates uptake of glucose by cells to reduce blood sugar levels and is commonly used for patients suffering with both type 1 and type 2 diabetes to replenish low levels of naturally produced insulin in the body [11]. Insulin is a prominent therapeutic protein. However, synthetic insulin has been shown to aggregate at sites of repeated injection as well as during the production, transportation, and storage process [12], highlighting a need to engineer a mechanism that can reduce insulin aggregation. For our study, we used a crystallized human insulin model [13] (PDB ID: 3I40) (Appendix A).
Human insulin binds to insulin receptors present in the membranes of liver and muscle cells. We modeled the receptor-ligand interaction using the first three domains of an insulin receptor expressed through C. griseus [14] (PDB ID: 2HR7).
2.1.2. Erythropoietin (EPO) & EPO Receptor
Erythropoietin is a glycoprotein hormone that stimulates production of red blood cells (erythropoiesis). Recombinant human erythropoietin (rhEPO) is often administered to patients suffering from anemia or those with low hematocrit due to infection or kidney disease [15]. A previous study found that aggregation of rhEPO results in conformational changes that greatly affect its activity and function [16]. By exploring the effects of PTMs on harmful aggregation of rhEPO, we hope to increase its efficacy. We used a human EPO model expressed through E. coli in our study [17] (PDB ID: 1BUY) (Appendix A).
EPO and rhEPO bind to receptors located on erythroid progenitors in the bone marrow [15]. We modeled this receptor using the extracellular domain of the human EPO receptor [18] (PDB ID: 1ERN).
2.1.3. Human Growth Hormone (HGH) & HGH Receptor
Human growth hormone is a peptide hormone that first received approval by the FDA to treat GH deficiency in children in 1985 [19]. Previous studies conducted by Fradkin et al. found that aggregation of growth hormone existing in commercial formulas of the drug stimulated increased levels of immunogenicity in mouse models [20]. To avoid potential side effects of this increased immune response, such as decreased therapeutic half-life and anaphylaxis in the patient [21], methods to reduce aggregation must be found. We used a wild-type HGH model in our study [22] (PDB ID: 1HGU) (Appendix A).
To model the receptor-ligand interaction between HGH and its receptor, we used chain B of an HGH-receptor complex [23] (PDB ID: 1A22), which corresponded to just the receptor domain of the modeled complex.
2.2. Featured Post-Translational Modifications
2.2.1. Phosphorylation
Protein phosphorylation refers to the reversible attachment of a phosphate group by a protein kinase (attachment) or phosphorylase (detachment) to a side chain of an amino acid. The side chains of serine, threonine, and tyrosine are usually the most commonly phosphorylated. Phosphorylation has the ability to alter the function, structure, and stability of a protein [24], making it an intriguing and ideal PTM to analyze in this study.
2.2.2. Acetylation
Protein acetylation is a type of acylation, which is the addition of an acyl group to a protein’s amino acid residues by acyltransferases (attachment) and deacylases (detachment) [25]. Acetylation commonly occurs at serine and lysine residues, and has been implicated in the regulation of protein stability, localization, activity, and affinity for DNA-binding [25]. Because of the ubiquitous functions of acetylation, many metabolic enzymes are tightly regulated with the addition or removal of acetyl groups [26]. The many abilities of acetylation also make it a valid PTM to analyze in this study.
2.2.3. Methylation
Protein methylation, catalyzed by methyltransferase (attachment) and demethylase (detachment), is the addition of a methyl group to amino acid residues—especially the lysines and arginines—of a protein [27]. Methylation can affect the function and conformation of molecules, making it ideal for modulating cell signaling and DNA repair pathways [27]. The implications of methylation on the characteristics of proteins prompted us to analyze it in this study.
2.3. Materials
We extracted PDB (protein data bank) format files of proteins for insulin, EPO, and HGH from the protein database RCSB.org, an open-source software containing numerous different protein structures and their relevant conformations [28] [29]. We identified the most recently uploaded files for and filtered searches for proteins classified as hormones originating from Homo sapiens, allowing us to get the most accurate representation of hormones in vivo. In addition, the common receptors for each of these proteins present in the human body were also obtained from the database. In cases where RCSB contained protein-receptor complexes, only the specific chain corresponding to the receptor itself was used.
2.4. Applying Post-Translational Modifications
We utilized an open-source server called Vienna-PTM 2.0 to apply post-translational modifications to the protein files we downloaded from RCSB.org. Vienna-PTM allows users to apply modifications to certain amino acids in uploaded PDB files [30]-[32]. Users can obtain new PDB files and force field parameters for a modified protein that can be used in molecular dynamics simulations such as GROMACS.
For our study, we selected two amino acids that are commonly modified for each group, and modified all instances of those amino acids in each protein to the most neutrally-charged relevant modification in Vienna-PTM to gauge the full effect of these PTMs. The amino acids modified in each group were chosen based on previously conducted studies (see section 2.2) and are detailed in Table 1.
Table 1. Specific modifications for each PTM.
Group |
Modification 1 |
Modification 2 |
Phosphorylated |
Serine → Phosphoserine (−1) |
Threonine → Phosphothreonine (−1) |
Acetylated |
Serine → Serine-O-acetylglucosamine, N-acetyllysine |
Lysine → N-acetyllysine |
Methylated |
Lysine → Methyllysine (0) |
Arginine → Omega-N-methylarginine (0) |
After modifying all specified amino acids, we exported and downloaded a new PDB file with the modified protein for analysis. This process was done for all experimentally modified insulin, EPO, and HGH files (See Appendix A).
2.5. Characterizing Aggregation
We set out to characterize the affinity for these proteins to aggregate with each other by calculating the DS and BFE between two identical copies of the protein in question. To calculate these values, we used a web server called HawkDock, which specializes in structural prediction and analysis of protein-protein complexes using ATTRACT for global macromolecular docking and HawkRank for scoring [33]-[36]. It takes in one PDB file as a receptor and one PDB file as a ligand. It then outputs the ten most probable complexes, ranked by their docking scores. Docking scores are commonly used by scoring functions of algorithms to represent ligand binding affinities. A more negative DS generally correlates to stronger intermolecular interaction [37]. In the context of our study, we seek to maximize DS and bring it closer to zero, since we are aiming for weaker interactions between proteins and therefore less aggregation.
For our project, we inputted two PDB files of the same, identical protein we were analyzing during that trial into both the receptor and ligand fields of HawkDock. This way, we were able to simulate real interactions between identical proteins administered into the bloodstream through a single injection. The average of the ten complexes was taken in order to equally represent the variability in bonding that may occur in the natural world.
We then ran an MM/GBSA analysis, which is available on HawkDock, on the modeled complexes to obtain BFE values for each corresponding model in kJ/mol [38]-[40]. Binding free energy is a more well-established and practical measure of binding affinity; it quantifies the free energy difference between the bound and unbound states of a complex, often in kJ/mol of complex [41]. Similar to DS, more negative values mean the bound state of a complex is preferred; therefore, we aim to maximize BFE. Additionally, MM/GBSA analysis is a commonly used method to calculate ligand binding affinities computationally, since it does not require large amounts of calculations and boasts remarkable accuracy compared to experimental data [42].
2.6. Analyzing Continuities in Biological Function
In order to minimize loss of function for these proteins from the PTMs, we analyzed the changes in modified proteins’ affinities for their native receptors.
We uploaded the respective receptor in the receptor field of HawkDock and the normal or modified protein we were analyzing in the ligand field, and recorded the top model. The top model, which is the most probable, is most likely to be the true receptor-ligand configuration in vivo. This assumption was also supported by the fact that the top model had a much lower docking score and binding energy than subsequent, lower-ranked models. Using this methodology, we were able to add an additional factor to our study’s consideration by weighing the change in biological function when making recommendations for most optimal PTMs.
Our overall method design consolidates all of these data points and is summarized in Figure 2.
Figure 2. Experimental design flowchart diagram.
3. Results and Data Analysis
3.1. Summary of Results
Since it is difficult to compare improvements in overall protein function (taking into account both aggregation and receptor affinity) without a numerical formula, we will mainly summarize improvements in the reduction of aggregation and comment on significant differences in receptor affinity (functional continuity). For a comparative, numerical analysis between modifications taking both aggregation and reception into account, refer to Section 3.3.
The average DS and BFE values for the top 10 models outputted by HawkDock for aggregation are summarized in Table 2 and Figure 3(a) & Figure 3(c), and the DS and BFE values for the top (most optimal) model outputted by HawkDock for receptor-ligand interactions are summarized in Table 3 and Figure 3 (b) & Figure 3(d).
Table 2. Average docking scores and binding free energy values for the top 10 aggregation models outputted by HawkDock.
AGG |
Normal (DS) |
Normal (BFE) |
P (DS) |
P (BFE) |
A (DS) |
A (BFE) |
M (DS) |
M (BFE) |
Insulin |
−2327.37 |
−26.78 |
−2064.41 |
−12.20 |
−2176.57 |
−18.59 |
−2097.45 |
−17.65 |
EPO |
−3790.09 |
−23.23 |
−3412.15 |
−11.41 |
−4193.08 |
−30.24 |
−684.33 |
32.86 |
HGH |
−4086.31 |
−26.12 |
−3588.18 |
−10.39 |
−3570.17 |
11.38 |
−1053.03 |
46.70 |
More positive is more beneficial.
Table 3. Average docking scores and binding free energy values for the top 10 reception models outputted by HawkDock.
REC |
Normal (DS) |
Normal (BFE) |
P (DS) |
P (BFE) |
A (DS) |
A (BFE) |
M (DS) |
M (BFE) |
Insulin |
−4669.43 |
−7.64 |
−4475.47 |
−18.64 |
−4496.3 |
−1.75 |
−4533.07 |
−28.34 |
EPO |
−6980.32 |
−43.94 |
−6905.43 |
−17.52 |
−5405.01 |
−27.3 |
−4888.91 |
−9.58 |
HGH |
−4748.43 |
−16.94 |
−4250.95 |
−12.12 |
−4463.7 |
1.94 |
−4248.69 |
−13.25 |
More negative is more beneficial.
(a)
(b)
(c)
(d)
Figure 3. (a) Average docking scores for aggregation. (b) Average docking scores for reception. (c) Average BFE values (in kJ/mol) for aggregation. (d) Average BFE values (in kJ/mol) for reception.
3.2. Statistical Significance between Control and Experimental Groups
To prove statistical significance between the normal (control) group and the experimental (phosphorylated, acetylated, methylated) groups for each protein, we used a single factor ANOVA test conducted through Google Sheets and the “XLMiner Analysis ToolPak” extension.
A single factor ANOVA (analysis of variance) test is a generalization of the two-sample t-test and calculates how much of the variance or discrepancy between data can be attributed to random error or the factor effect [43]. It outputs an F-statistic and P-value. In our study, we used the P-value at a significance level of 0.05 to indicate significance. A lower P-value, in this case p < 0.05, means the variance is likely not due to random chance and instead due to our treatment. Although we can see at a clear glance that the mean DS and BFEs are higher or lower between experimental groups, we applied this test to prove our findings were significant and identify which PTMs on which proteins had significant effects on their aggregation affinity. We also ran an overall ANOVA test with all four groups for each protein to show that the three PTMs we explored caused significant results in general. Since all overall P-values were significant at ɑ = 0.05, we rejected our null hypothesis. We did not run an ANOVA test on the receptor affinities since we only had one data point for each group. The P-values on the ANOVA analyses of aggregation affinity data between modified and unmodified proteins are shown in Table 4 and Table 5.
Table 4. P-values for ANOVA tests on docking scores.
DS |
Normal x P |
Normal x A |
Normal x M |
Overall |
Insulin |
0.0121* |
0.0813 |
0.0641 |
0.0350* |
EPO |
0.0545 |
0.0194* |
0.0000* |
0.0000* |
HGH |
0.0349* |
0.0474* |
0.0000* |
0.0000* |
*Statistically significant (p < 0.05).
Table 5. P-values for ANOVA tests on binding free energies.
BFE |
Normal x P |
Normal x A |
Normal x M |
Overall |
Insulin |
0.0028* |
0.0861 |
0.0739 |
0.0151* |
EPO |
0.0610 |
0.1793* |
0.0000* |
0.0000* |
HGH |
0.0263* |
0.0000* |
0.0000* |
0.0000* |
*Statistically significant (p < 0.05).
3.3. Creation of an Index to Assess Protein Viability
In order to quantitatively and specifically measure the improvement or deterioration of each PTM treatment, we created a protein viability index equation by modifying a sigmoid function using the base aggregation and reception values from the normal protein, and the aggregation and reception values from the modified protein:
(1)
The process taken to create the index was to first have a baseline value of comparison for each protein, which was the normal DS and BFE in aggregation and reception for an unmodified protein (DS and BFE are calculated in separate programs, using their specific base values for each protein). Using the base values for a protein, the post-translationally modified protein’s average DS or BFE values were inputted for aggregation and reception for that same protein. By taking the difference between the experimental value and the unmodified base value and dividing by the unmodified base value, the experimental value is normalized for that specific protein which allows other transformations to be put upon the data. A weight of 0.25 was used for the aggregation aspect, and 0.75 was used for the reception aspect. This 1:3 ratio between reception values and aggregation values represents the greater importance of the reception values, exhibiting how the original function of a protein staying constant is not something that can be changed while increasing protein longevity. Changes in the function of the protein as a whole causes longevity to become irrelevant in this setting and therefore must be prioritized in the index. The ratio chosen can vary depending on the difference in importance, however this is something that can be more studied using wet-lab experiments to study how PTMs can change the function of a protein more accurately.
After the normalized and weighted values were calculated, they were put into a sigmoid-shaped logistic function that was modified to output values from −1 to 1 in Python (see Appendix B). The logistic function transforms the given data around a midpoint, where the instantaneous rate of change is a vertical line, to a specified range. It is commonly used in data analysis. Here, the point of inflection was 0, representing no change, and the limits of the left and right sides of the function approached −1 and 1, respectively. A positive output represented an increase in the overall performance of the modified protein given the weighting of aggregation and reception, and a negative output represented a decrease in the performance of the modified protein. The normal insulin, EPO, and HGH proteins can automatically be assigned a value of 0 on the index as they are the baseline models. The values provided by this index is used as a quantitative measure to determine the change in the DS and BFE caused by the PTM and allows us to objectively and numerically compare the differences in differently modified proteins (see Appendix B for index source code). The index values are shown in Table 6, Table 7, and Figure 4.
4. Discussion
In general, our results show that PTMs do have a significant effect on the aggregation and longevity of therapeutic proteins, with all overall P-values < 0.05 and some overall P-values < 0.0001.
Table 6. Docking score index values for PTMs on each protein.
DS |
P |
A |
M |
Insulin |
−0.0015 |
−0.0058 |
0.0014* |
EPO |
0.0084* |
−0.0976 |
−0.0099 |
HGH |
−0.0240 |
−0.0067 |
0.0533* |
*Positive, beneficial change (value > 0).
Table 7. Binding free energy index values for PTMs on each protein.
BFE |
P |
A |
M |
Insulin |
0.5427* |
−0.2457 |
0.7851* |
EPO |
−0.1605 |
−0.1778 |
0.0085* |
HGH |
−0.0314 |
−0.2341 |
0.2606* |
*Positive, beneficial change (value > 0).
(a)
(b)
Figure 4. An index of 0 signifies no change and serves as a baseline for comparison. (a) Docking score index values. (b) Binding free energy index values.
For insulin, phosphorylation had the most beneficial effect on DS and BFE for aggregation (significant with p < 0.05, p < 0.01 respectively). However, for receptor affinity, it did result in a slight increase in DS while BFE decreased, suggesting ambiguity for phosphorylation’s effect in terms of functional continuity. With the objective index, phosphorylation had a substantially positive score for DS, but a slightly negative score for BFE, ranking it second best behind methylation which had substantially positive scores for both DS and BFE. Surprisingly, in terms of aggregation, methylation was beneficial but not as much as phosphorylation, but it received the highest DS and BFE index scores when incorporating receptor affinity. Acetylation was the least beneficial and resulted in slightly negative index scores for both DS and BFE, signaling that it may actually have detrimental effects on insulin function and half-life, although the differences were not statistically significant.
In terms of EPO, methylation conferred a significant increase in DS and BFE (p = 0 for both), demonstrating that it had the greatest effect in reducing aggregation and subsequently increasing half-life. In fact, BFE actually became positive (32.86 kJ/mol). It did, however, also have the most negative effect on receptor affinity. Therefore, index scores for methylation were close to 0, with one positive and one negative. Phosphorylation was the next best for aggregation and also had one positive and one negative index, putting it in similar standing with methylation in terms of overall benefit. Acetylation had the worst indexes, with an extremely negative DS score and a negative BFE score (significant with p < 0.05). Both phosphorylation and acetylation conferred more moderate effects on aggregation and receptor affinity than methylation.
Finally, with HGH, methylation once again had the most beneficial effect on aggregation by increasing DS and BFE (p = 0 for both). It also resulted in just slight decreases in receptor affinity, giving it positive index scores for both DS and BFE and making it the most beneficial PTM. Phosphorylation and acetylation had less drastic results compared to methylation for aggregation, with phosphorylation having a slightly higher DS than acetylation. However, it had a much lower BFE compared to acetylation for aggregation (−10.392 kJ/mol vs. 11.381 kJ/mol), and results from both phosphorylation and acetylation were significant (p < 0.05 for both). Using the index, phosphorylation seems to rank higher than acetylation since it has a greatly less negative score for BFE. All modifications resulted in little change to receptor affinity. The only notable difference is that acetylation resulted in a positive BFE for receptor affinity, suggesting that it improves receptor-ligand interactions slightly.
Interestingly, even though we cannot generalize due to the small scope of this experiment, acetylation seemed to consistently confer the least benefit to therapeutic proteins; in fact, all six of its index scores were negative. This may be due to the charged nature of the acetyl groups that may have made the proteins more negative and therefore promoted polar or charged interactions and subsequent aggregation. Additionally, methylation greatly reduced aggregation (by increasing DS and BFE) in EPO and HGH. We infer that this may be due to the nonpolar properties of the group interfering with aggregation. In general, phosphorylation seemed to have moderate effects. The varying beneficial and detrimental effects of phosphorylation and methylation in particular between the three proteins supports Lee et al.’s findings regarding the ambiguity and variability in the effects of PTMs on both related and unrelated protein groups [10].
For some proteins, there was a discrepancy between DS and BFE (i.e. DS increased but BFE decreased). This is still a valid result since the definitions of DS and BFE are slightly different, since the DS takes into account the structure of the complex and the intermolecular interactions while the BFE takes into account specifically the free energy differences between the bound and unbound states of the complex. In other words, DS incorporated BFE into its calculation and is a more generalized value regarding the protein interactions.
5. Conclusions
In this study, we analyzed the docking scores and free energy values of post-translationally modified proteins bound to themselves and their specific receptors. Using this information, we determined which post-translational modifications were best for insulin, erythropoietin, and human growth hormone by comparing the change in aggregation and reception values using a normalized index that we created. We found methylation to be the most beneficial for insulin and HGH, and both phosphorylation and methylation to be somewhat optimal for EPO. Post-translational modifications have been a common field of research in recent times, however existing research had the issue of not explicitly comparing the changes in protein-protein interaction between different PTMs on different proteins. This study, although done on a smaller scale, allowed the impacts of PTMs to be quantified and systematically compared, where different PTMs can be seen to have a different effect on each different protein.
Some future directions of this research include expanding the scope of the study to include more commonly used therapeutic proteins, such as clotting factor XIII and interferons, to explore the effects of PTMs on more proteins’ longevity. We can also explore and analyze more PTMs beyond the three used in this study; for example, carboxylation, glycosylation, and PEGylation are other commonly used PTMs in the industry. Combined with future research dealing with a larger pool of proteins and PTMs, as well as a more sophisticated and wet-lab test of the relation between aggregation versus reception to get better index values, this information can be used to create a large-scale database of PTM impacts on different protein biotherapeutics. A resource of such scale would be an essential piece of information for use in pharmaceutical and medical fields to modify existing drug treatments and improve the world of drug development.
Appendix A
Protein Structure Images
Note: Proteins structures shown below come from the RSCB.org database (Figures A1(a)-(c)) and the HawkDock software visual output (Figures A1(d)-(e)).
Figure A1. (a) Insulin. (b) Erythropoietin. (c) Human Growth Hormone. (d) Insulin Aggregation Complex. (e) Insulin-Receptor Binding Complex.
Appendix B
Source Code for Index Calculations
Note: A different file was used for each protein and for DS/BFE which had the baseline values unique to each. The file shown below is hghindex.py, specifically calculating BFE.
1234567891011121314151617181920212223242526272829303132333435 |
import numpy as npdef sigmoid(x, k=1, c=0):#Generalizes data on a scale from -1 to 1 using sigmoid mathematical function return (2 * (1 / (1 + np.exp(-k * (x - c))))) - 1#Inputted values from experimental groups(variable)#The following shows data from acetylated HGH BFEvalue_agg = 46.695value_rec = -13.25#Baseline values from normal group(constant)#The following shows the baseline values from normal HGH BFEbase_agg = -26.124base_rec = -16.94#Weighting of aggregation vs reception#Shows a 1:3 ratio between importance of aggregation vs receptionweight_agg = 0.25weight_rec = 1 - weight_aggdef normalize(value, base):#Normalizes data to a constant scale using baseline values return (value - base) / abs(base)norm_agg = (normalize(value_agg, base_agg))norm_rec = -1 * (normalize(value_rec, base_rec))protein_combined = weight_agg * norm_agg + weight_rec * norm_rec#Outputs a final score by inputting normalized values into the sigmoid functionfinal_score = sigmoid(protein_combined)print(final_score) |
NOTES
*Joint First Authors (In Alphabetical Order).