RimJ-Catalyzed Sequence-Specific Protein N-Terminal Acetylation in Escherichia coli ()
1. Introduction
The initial stage of protein maturation involves the processing of the N-terminal end of newly synthesized polypeptide chain in both prokaryotes and eukaryotes [1] . In prokaryotes, the intiating N-formylmethionine (fMet1) residue is deformylated by peptide deformylase in most proteins and the resulting methionine (Met1) residue is also cleaved by methionine aminopeptidases (MAP) in a significant fraction of total proteins [2] . In a few endogenous bacterial proteins, the N-terminal residues after the Met1 cleavage are subsequently acetylated by the corresponding N-acetyltranferases (NATs) [3] -[5] . In Escherichia coli, the ribosomal proteins S18, S5 and L7/L12 are Nα-acetylated by their corresponding NATs, RimI, RimJ, and RimL, respectively [6] [7] . Elongation factor Tu and the chaperon SecB are also Nα-acetylated in E. coli, but their Nα-acetylation mechanism remains to be identified. The functional relevance of the Nα-acetyl group in these Nα-acetylated endogenous proteins is unclear since no phenotypic difference was observed in the bacterial cells in which the NAT genes were deleted. In contrast, in eukaryotes Nα-acetylation is prevalent and plays a significant role in the stability, activity and targeting of certain proteins [8] -[16] .
Most eukaryotic proteins are not acetylated when ectopically expressed in E. coli. However, partial or complete N-acetylation has been reported for several recombinant proteins including the stathmin-like domains (SLDs), interferons, thymosin α (Τα), and tissue inhibitors of metalloproteinases (TIMPs) [17] -[22] . Interestingly, either the presence or absence of the Nα-acetyl group is critical for the activities of some of these biologically important proteins. For example, Tα is active when Nα-acetylated but TIMPs are inactive when Nα-acety- lated [12] [23] .
Although the NATs for most other Nα-acetylated recombinant proteins remained unidentified, it was reported that RimJ catalyzes N-acetylation of the Tα1 fusion proteins in E. coli [24] . We recently reported that the Z- domain could be fully Nα-acetylated in the presence of highly expressed RimJ [25] . A survey of the reported Nα-acetylated endogenous and ectopic proteins indicates that Nα-acetylation exclusively occurs at the alanine, serine, threonine or cysteine residues in E. coli. However, it is not clear whether the reported narrow sequence specificity of Nα-acetylation is due to the limited literature examples or to the sequence selectivity of specific NATs in E. coli. Moreover, the amino acid sequence requirement of the Nα protein acetylation has never been systematically investigated in E. coli.
The present study used the Z-domain as a model protein to establish the sequence dependence of RimJ-me- diated Nα-acetylation in E. coli. The Z-domain variants differing by the second or third amino acid residue were expressed and analyzed by mass spectrometry, by which a 131-Da mass decrease and a 42-Da mass increment indicated Met1 cleavage and Nα-acetylation, respectively.
2. Materials and Methods
2.1. General
XL1-Blue E. coli cells were used for cloning and maintaining plasmids. BL21(DE3) E. coli cells were used for protein expression. pET-Z encodes the Z-domain with a C-terminal hexa-histidine tag under control of the T7 promoter. pACYCDuet-RimJ encodes RimJ under control of the T7 promoter. Taq DNA polymerase (NEB) was used for polymerase chain reaction (PCR).
2.2. Site-Directed Saturation Mutagenesis of the Z-Domain Gene
The genes for the Z-domain variants containing all possible 20 amino acids in position 2 or 3 were generated by PCR of pET-Z with the primer ZR (5’-GCAGCCGGATCTCAGTGGTGG-3’) and one of the oligonucleotide primers shown in Table 1. Each of these primers contains either a degenerate or specific codon sequence (in boldface) at the designated position as well as an NdeI restriction site (underlined). The degenerate codon sequences include NNK, NAK, HTT and VAA where N is all four nucleotides; K is G or T; H is A, C or T; V is A, C or G. Each PCR product was inserted between the NdeI and XhoI sites of pET-Z, and transformed into XL1-
Table 1. Sequence of oligonucleotide primers that were used to generate the Z-domain gene variants.
Blue E. coli cells. Plasmids were isolated from single colonies and sequenced.
2.3. Expression and Purification of the Z Domain Variants
BL21(DE3) E. coli cells, co-transformed with each pET-Z variant and pACYCDuet-RimJ were grown overnight at 37˚C in the Instant TB auto-induction medium (Novagen) containing 100 μg/mL carbenicillin and 50 μg/mL chloramphenicol. Cells were harvested by centrifugation and lysed by sonication in lysis buffer (50 mM sodium phosphate buffer with 300 mM NaCl, pH8.0). The Z-domain in the cell lysate was purified with Ni-NTA metal affinity resin (Novagen) under native conditions. Each purified protein was concentrated by ultrafiltration YM-3 Microcon centrifugal filter device (Millipore).
2.4. Mass Spectrometry
The Z-domain variants were analyzed by Agilent 6224 Accurate-Mass TOF LC/MS system equipped with an electrospray ionization (ESI) source. Multiple-charged protein ions were deconvoluted by using the MassHunter Workstation software. The deconvoluted mass spectra of the Z-domain variants differing by the amino acid residue in position 2 or 3 are shown in Figure 1 and Figure 2, respectively.
2.5. Theoretical Mass Calculations
The theoretical mass of each Z-domain variant containing one of 20 amino acids in position 2 or 3 was calculated based on its amino acid sequence: MJOVDNKINKEQQNAFYEILHLPNLNEEQRDAFIQSLKDDPS- QSANLLAEAKKLNDAQAPKGSHHHHHH where J is one of 20 amino acids and O is S; or J is T and O is one of 20 amino acids, respectively. The calculated average masses for the different N-terminal end forms of each Z-domain variant are listed in Table 2 and Table 3.
3. Results and Discussion
In order to construct the genes for the Z-domain variants containing 20 different amino acids in position 2 or 3, the corresponding codon in the pET-Z plasmid was replaced with the nucleotide sequence NNK where N is all four nucleotides and K is G or T. According to the standard genetic code, NNK covers all 20 amino acids. About a half of the possible 20 variants for each position were obtained by this initial site-directed saturation mutagenesis. The remaining Z-domain gene variants containing less frequent genetic codes were obtained by replacing the corresponding codon with more targeted random nucleotide sequences (NAK and HTT for the second codon and VAA for the third codon where H is A, C, or T; V is A, C, or G) or with the codon for the specific amino acids (i.e. TGG for Trp; ATG for Met).
Each Z-domain variant was individually expressed as a C-terminal His-tag fusion in BL21(DE3) E. coli cells, co-transformed with its corresponding pET-Z mutant plasmid and pACYCDuet-RimJ. In the presence of the latter plasmid from which RimJ is overexpressed, the Z-domain could be Nα-acetylated [25] . The transformed cells were grown in Instant TB, an auto-induction medium in which the protein expression is induced by lactose as soon as glucose is deprived. The overexpressed Z-domain variants were purified from the cell lysate by immobilized metal ion affinity chromatography (IMAC). Each of the purified Z-domain variants differing by the second or third amino acid residue was analyzed by electrospray ionization mass spectrometry (ESI-MS) and subsequent deconvolution of multiple-charged protein ions as shown in Figure 1 and Figure 2, respectively. For all variants, the observed masses were matched with the calculated values within less than one Da (Table 2 and Table 3).
The nature of the second amino acid had a significant impact on the removal of the Met1 residue and Nα-acetylation. The Met1 residue of the Z-domain was cleaved only when the second amino acid was glycine, alanine, proline, serine, threonine, cysteine, or valine (Figure 3(a)). Of these six variants, all but the Val-2 variants showed complete cleavage of the Met1 residue. For all the other amino acid variants, no cleavage of the Met1 residue was observed as the Z-domain with the Met1 residue (the Met-Z form) was mainly detected. These results clearly confirmed the sequence specificity of the E. coli MAP, which was previously determined by N- terminal sequencing of the expressed methionyl-tRNA synthetase mutants differing only by the second amino acid residue and by the computer analysis of the known E. coli protein N-terminal sequences [26] .
A substantial amount of the Nα-acetylated Z-domain without Met1 (the Ac-Z form) was detected along with
Figure 1. Deconvoluted mass spectra of the Z-domain variants differing by the amino acid residue in position 2. The Met-Z and Z forms are the Z-domain variants with and without the initiating Met residue, respectively. The Ac-Met-Z and Ac-Z forms are the Nα-acetylated Z-domain variants with and without the initiating Met residue, respectively.
Figure 2. Deconvoluted mass spectra of the Z-domain variants differing by the amino acid residue in position 3. The Met-Z and Z forms are the Z-domain variants with and without the initiating Met residue, respectively. The Ac-Met-Z and Ac-Z forms are the Nα-acetylated Z-domain variants with and without the initiating Met residue, respectively.
Figure 3. Distribution of different N-terminal end forms for the Z-domain variants containing 20 different amino acids: (a) In the second position; (b) In the third position. The Met-Z and Z forms are the Z-domain variants with and without the initiating Met residue, respectively. The Ac-Met-Z and Ac-Z forms are the Nα-acetylated Z-domain variants with and without the initiating Met residue, respectively. The percent ratio of the peak intensity of each different N-terminal end form to the peak intensity sum of all four forms is shown in the bar graphs for each Z-domain variant.
the unacetylated Z-domain without Met1 (the Z-form) when the second amino acid was serine, threonine or valine, each of which is one of the amino acid residues required for the Met1 removal in E. coli. When the second amino acid was glycine, alanine, or proline, only minor Nα-acetylation was observed as the Z-form was mainly detected along with a trace amount of the Ac-Z form despite complete Met1 removal. This suggests that the RimJ-catalyzed Nα-acetylation occurs only subsequent to the Met1 cleavage. Notably, only minor acetylation was detected for the Ala-2 variant although the ribosomal protein S5, the natural substrate of RimJ, is acetylated at its N-terminal alanine residue. This is probably due to the further influence of the amino acid residues next to the N-terminal group (see below).
For the Val-2 variant in which the Met1 residue was incompletely removed, no acetylation was observed when Met1 remained but Nα-acetylation was detected when Met1 was removed. When the second amino acid was cysteine, Met1 was completely removed, but the resulting N-terminal cysteine residue was not acetylated. Instead, the Z domain forms with +72 and +28 mass shifts were detected, indicating the presence of thiozolidine adducts probably formed during protein expression within the cells by a non-enzymatic reaction of the N-ter- minal cysteine residue with pyruvate and subsequent decarboxylation, respectively [27] . It seems unlikely that the lack of Nα-acetylation in the Cys-2 mutant is due to this chemical modification since no increase in Nα-acety- lation was observed even when the protein was expressed under conditions that minimize pyruvate formation [27] . However, it should be noted that Nα-acetylation of the cysteine residue was previously observed for recombinant human TIMPs and interferons in E. coli [12] [18] . Only trace to minor amounts of Nα-acetylation was observed for all other variants with the intact Met1 residue.
These results clearly suggest that RimJ specifically acetylates the N-terminal serine, threonine or valine residue, subsequent to the Met1 cleavage by the E. coli MAP. In E. coli, Nα-acetylation of the serine residue was observed for a few endogenous proteins including the ribosomal protein L12, EF-Tu and chaperon SecB as well as recombinant human Tα fusion proteins [4] [5] [7] [19] . The ribosomal proteins S18 and S5, the natural substrates of RimI and RimJ, respectively, are acetylated at the N-terminal alanine residue [6] . The recombinant eg-
Table 2. Observed masses and their relative intensities of the different N-terminal end forms for each Z-domain variant differing by the second amino acid residue.
Note: aDNA sequencing of the expression plasmid revealed a single nucleotide substitution which resulted in an additional amino acid substitution (Leu54Pro); bDNA sequencing of the expression plasmid revealed a single nucleotide substitution which resulted in an additional amino acid substitution (Ile19Thr); cZ-domain without the Met1 residue; dNα-acetylated Z-domain without the Met1 residue; eZ-domain with the Met1 residue; fNα-ace- tylated Z-domain with the Met1 residue; g2-Methylthiazolidine derivative of the Z-domain; h2-Methyl-2-thiazolidinecarboxylate derivative of the Z- domain.
lin c was acetylated at the N-terminal threonine residue [20] . The preference of small N-terminal amino acid re- sidues by the Rim proteins for Nα-acetyaltion in E. coli is definitely similar to that by NatA, one of three NATs in eukaryotes [28] . Similar N-terminal sequence specificity for Nα-acetylation was also observed in archaea [29] .
Met1 removal was little affected by the third amino acid residue in the Z-domain. As shown in Figure 3(b), Met1 was completely removed for all but the Pro-3 variants. When the third amino acid was proline, the Met-Z form was mainly detected. This was probably because the structural distortion caused by the proline residue in the third position hindered the E. coli MAP from recognizing the Thr-2 residue for the Met1 cleavage.
The extent of the RimJ catalyzed Nα-acetylation significantly varied with the nature of the amino acid residue in position 3 (Figure 3(b)). For the Gly-3 variant, the Z-form was mainly detected along with a trace amount of the Ac-Z form. When the third amino acid was proline, a trace amount of the Ac-Z form was also detected along with the major Met-Z form. When the third amino acid was alanine, serine or threonine, a substantial amount of the Ac-Z form was observed although the Z-form was a major product. Almost equal amounts of the Z and Ac-Z forms were detected for the Z-domain variants containing cysteine, glutamine, histidine, or tyrosine in position 3. When the third residue was leucine, isoleucine, asparagine, phenylalanine or tryptophan, the Ac-Z form was a major product along with a minor Z-form. For the Met-3 and Val-3 variants, the Ac-Z form was almost exclusively detected. These results suggest that an uncharged, preferably hydrophobic, residue including bulky aro-
Table 3. Observed masses and their relative intensities of the different N-terminal end forms for each Z-domain variant differing by the third amino acid residue.
Note: aZ-domain without the Met1 residue; bNα-Acetylated Z-domain without the Met1 residue; cZ-domain with the Met1 residue; dNα-Acetylated Z-domain with the Met1 residue; eThe Na+ adduct of the Z-domain without the Met1 residue.
matic amino acids in the penultimate position stimulated the RimJ-mediated Nα-acetylation. This pattern is consistent with our previous observation that the Z-domain variant containing p-benzoyl-L-phenylalanine at position 3 showed complete Nα-acetylation. An aromatic amino acid was found next to the Nα-acetylated residue for a few endogenous proteins in eukaryotes as well as for the recombinant human interferon γ expressed in E. coli [18] .
A dramatic difference in Nα-acetylation was observed when a charged residue was in the third position. Whereas very minor Nα-acetylation was evident for the Lys-3 or Arg-3 variants, substantial or almost complete Nα-acetylation was observed for the Glu-3 or Asp-3 variants, respectively. These results clearly suggest that RimJ disfavors a positively charged residue but favors a negatively charged one in the penultimate position. Indeed, many recombinant proteins that were Nα-acetylated in E. coli have an acidic residue in position 3: aspartate in interferon A, SLDs and Tα, and glutamate in eglin c [19] -[21] . This specific preference of a negatively charged penultimate residue on Nα-acetylation was also observed for a number of eukaryotic proteins including those with the intact Met1 residue [28] .
4. Conclusion
The Met1 residue of the Z-domain was removed only when the second amino acid was glycine, alanine, proline, serine, threonine, cysteine, or valine, consistent with the reported sequence specificity of the E. coli MAP. Only subsequent to the Met1 cleavage, the RimJ-catalyzed N-terminal acetylation mainly occurred at the N-terminal serine, threonine, or valine residues. The N-terminal acetylation of the Z-domain was significantly decreased by glycine, proline, arginine or lysine in the penultimate position, but was enhanced by hydrophobic or negatively charged residues in the same position. Practically, this study offered a basis to predict or control Met1 cleavage and Nα-acetylation of recombinant proteins in E. coli, especially when these N-terminal end modifications significantly affected protein stability or activity.
Acknowledgements
This study was supported by the TCU-RCAF, SGA and Andrews Institute.
NOTES
*Corresponding author.