Successful Expression and Purification of Dppd, Using a Codon Optimized Synthetic Gene

DPPD (Rv0061) is a difficult to express protein of Mycobacterium tuberculosis that elicits strong and specific delayed type hypersensitivity reactions in humans infected with M. tuberculosis. Therefore DPPD is a molecule that can improve the specificity of the tuberculin skin test, which is widely used as an aid for the diagnosis of tuberculosis. However, a pitfall of our initial studies was that the DPPD molecule used to perform the skin tests was engineered as fusion molecule with another Mycobacterium protein. This approach was used because no expression of DPPD could be achieved either as a single molecule or as a fusion protein using a variety of commercially available expression systems. Here, we report the production and purification of rDPPD using a synthetic gene engineered to contain E. coli codon bias. The gene was cloned into pET14b expression vector, which was subsequently used to transform Rosetta 2(DE3) pLysS or BL-21(DE3)pLysS host cells. The re-combinant protein was over-expressed after induction with IPTG and its purification was easily achieved at levels of 5 – 10 mg/l of bacterial broth cultures. The purified protein was confirmed to be DPPD by Mass Spectroscopy se-quencing analysis. Moreover, purified rDPPD stimulated peripheral blood mononuclear cells of PPD positive blood donors to produce high levels of IFN-γ, thus confirming that this molecule is biologically active. Because of the DPPD gene is restricted to the tuberculosis-complex organisms of Mycobacterium genus, this highly purified molecule should be useful for the identification of individuals sensitized with tubercle bacilli.


INTRODUCTION
Over-expression of recombinant proteins in Escherichia coli host cells followed by purification in affinity resins has become a routine and useful procedure to produce a variety of antigenic molecules.However, not seldom and for not entirely known reasons, the bacterial host cells fail to properly express the recombinant proteins coded for by the genes cloned into a number of different expression vectors.Several possibilities have been raised to explain this difficult-to-express proteins puzzle.These include protein toxicity to E. coli, plasmid or protein instability, inefficient transcription or translation, inefficient post-translational modification, and presence in the cloned gene of inadequate or non-used codon sequences by the E. coli host cells [1,2].
A typical example of difficult-to-express protein is DPPD, a Mycobacterium tuberculosis molecule that we have described several years ago [3,4].The name DPPD represents the single letter code of the first four amino acids of the N-terminus sequence of the mature form of the protein (aspartic acid, proline, proline, aspartic acid).DPPD is a small protein composed of 84 amino acids and is a major component of the complex protein mixture present in purified protein derivative (PPD), which is the antigenic preparation used in the skin test performed in humans and cattle for the diagnosis of tuberculosis [5][6][7][8][9].Therefore, DPPD is interesting candidate molecule to be used as a purified antigen for the diagnosis of tuberculosis.However, expression of recombinant DPPD in E. coli host cells either as a single molecule or as a fusion protein, using an exhaustive list of commercially available expression systems for this host cells, consistently failed.Moreover, rDPPD could not be expressed in other cell systems such as the yeast Pichia pastoris and Streptococcus lividans.In addition, attempts to produce a synthetic DPPD polypeptide through several protein manufacturing services were also unsuc-cessful.Partial success to express rDPPD as a single molecule was achieved using Mycobacterium smegmatis as host cells transformed with the plasmid pSMT3 [10].Unfortunately, the yields ofpurified protein obtained with this system was extremely low (~500 μg/liter of bacterial culture medium), which precludes the production of the protein for biological or clinical studies.However, because in this system both the host cells and the cloned gene belong to the same bacterial genus (Mycobacterium), these studies suggested that shared preferential codon usage between the two species facilitated the expression of rDPPD in M. smegmatis.
In the present work we tested this possibility.A synthetic DPPD gene containing a sequence designed to have optimized E. coli preferential codons was obtained and cloned into pET14b.Transformation of E .colihost cells with this plasmid followed by induction with IPTG resulted in consistent and successful over-expression rDPPD, which could be purified by affinity chromatography (Ni-NTA resin) yielding 5 -10 mg of purified recombinant protein per liter of bacterial culture.

DPPD Codon Optimization and Gene Subcloning
The codon optimization for maximum expression in E. coli was performed using iteratively sampling from a codon usage table to find a low free energy solution, which typically results in decreased secondary structure of the mRNA.The DPPD optimized sequence was synthesized at Blue Heron (https://www.blueheronbio.com) using a proprietary technology which allows 100% accuracy.An Nde I and a Bam HI restriction enzyme sequences flanking the 5' and 3' ends respectively were incorporated in the DPPD gene sequence.The synthesized DNA was cloned into pUCminusMCS (Blue Heron) followed by sequencing, which confirmed the designed optimized sequence.DPPD synthetic gene was then subcloned into the pET14b expression vector (Novagen EMD Chemicals, Gibbstown, NJ) using the two restriction enzymes.Successful subclones were confirmed by PCR using Taq Plus DNA polymerase (Invitrogen) and DNA sequencing.

rDPPD Protein Expression and Purification
Plasmid pET14b harboring the DPPD optimized gene was used to transform Escherichia coli BL21(DE3) pLysS or Rosetta 2 (DE3)pLysS (Novagen) expression hosts.100 ml LB cultures of IPTG-induced E. coli were processed under native or denaturing conditions followed by affinity chromatography using nickel-nitrilotriacetic acid (Ni-NTA) agarose matrix (Qiagen, Valencia, CA).Purified rDPPD was analyzed by SDS-PAGE and concentration was determined by BCA (Pierce Thermo Scientific, Waltham, MA).

Western Blot
Purified recombinant DPPD protein was electrophoresed on SDS-PAGE (4% -20% gradient, BioRad, Hercules, CA) under reducing conditions and transferred to a PVDF membrane (Immobilon-P, Millipore, Billerica, MA).The blot was blocked for one hour at room temperature with Tris-HCl buffer pH 7.4 containing 0.1% Tween 20 and 1% BSA, andsubsequently incubated with HRP-labeled anti-His tag monoclonal antibody (Invitrogen, Carlsbad, CA).Bound conjugate was detected by using the LumiGlo chemiluminescent system (Cell Signaling Technology, Danvers, MA), and visualized by exposing the membrane to an autoradiography film (Kodak BioMax, Rochester, NY).

Human Interferon-Gamma Response
Blood samples (10 ml) were collected with informed consent from 6 healthy, confirmed PPD positive donors.Human peripheral blood mononuclear cells (PBMC) were isolated by gradient centrifugation (Histopaque, Sigma-Aldrich, St. Louis, MO). 2 × 10 5 cells PBMC in complete RPMI supplemented with 10% human AB serum were incubated (37˚C, 5% CO 2 ) with different concentrations of rDPPD (0.4 to 10 μg/ml) or PPD (2.5 μg/ml to 10 μg/ml); PHA (1/500) or medium alone was included as positive and negative controls, respectively.Supernatants were collected after 72 h of culture and assayed for IFN-γ content using commercial capture and detection mAb pairs for sandwich ELISA (BD Biosciences, Rockville, MD).

Codon Optimization
The M. tuberculosis DPPD full length gene (Rv0061) codes for a typical secreted protein, which includes a signal peptide sequence of 39 amino acids followed by the signal peptidase recognition sequence Ala-Ser-Ala (Figure 1).Because in our original studies [3,4,10] we found that M. tuberculosis produces only the mature sequence of the protein we opted to evaluate the codon optimization designed to include only this form of the molecule.To facilitate expression an initiation ATG codon was added to 5' end of the gene sequence.The deduced amino acid sequence of the 112 residues of the protein, reveals that the full length gene codes for a typical secretory protein, which includes a signal peptide sequence (illustrated in red letters), the signal peptidase recognition sequence ASA (illustrated in green, bold, italic letters) and the mature sequence of protein (DPPD), which is depicted in black bold letters.

Protein Expression and Purification
The DNA fragment containing the synthetic optimized DPPD sequence was cut from pUCminusMCS and sub -cloned into pET-14b vector.This expression vector contains a Histidine tag (His-tag) sequence before the Nde I cloning site thus generating a recombinant protein containing a sequence of six His residues at the N-terminus to facilitate its purification by affinity binding to a Ni-NTA agarose matrix.The expression vector was used to transform Rosetta 2(DE3)pLysS or BL21 (DE3)pLysS host cells which were then induced with IPTG.Recom-binant protein expression was assessed by SDS-PAGE with Coomassie blue staining and is illustrated in Figure 3A (lanes 1 and 2 respectively).Over-expressed proteins bands of MW ~12 kDa and ~25 kDa can be seen in the lane corresponding to the IPTG induced culture.No such strongly stained bands are seen in the lane corresponding to the non-induced culture (lane 1).
The recombinant protein was purified as soluble protein from BL21(DE3)pLysS or from the inclusion bodies in Rosetta 2(DE3)pLysS using affinity chromatography.Yields of purification ranged from 5-10mg of purified protein per liter of induced culture.Figure 3 (lane 3B) shows the purified recombinant protein.Two major band aggregates are seen at a positions matching the overexpressed protein seen in the induce cultures (Figure 3A, lane 2).The lower bands (MW ~8 -15 kDa) are within the range of the predicted MW of the native mature form of DPPD (9.02 kDa).This pattern of migration is usually observed with proteins with high content in proline residues [11][12][13][14] and is consistent with our former observations for both native and recombinant molecules [4,10].In addition, three other bands of higher MW (~20 -30 kDa) are also clearly seen.At first, many of these  bands would be assumed to be contaminants.However, because the mature DPPD contains four cysteine residues it is also possible that the extra bands seen in the gel are indeed polymers (or aggregates) of the single molecule.Although the gel electrophoresis illustrating the purity of DPPD ran under reducing conditions, it is possible that reduction was not complete, which results in several bands of the same molecule been displayed in the gel.
To test this hypothesis we performed Western blot analysis carried out under reducing and non-reducing conditions.Because rDPPD was engineered to contain a tag of six histidines the identification of the molecule was performed using an anti-His tag IgG mAb. Figure 4 confirms that this monoclonal antibody recognized the over-expressed protein.Under reducing conditions (Figure 4, Lane R) the Western blot confirmed that the two groups of molecules (~8 -15 kDa and ~20 -30 kDa) were strongly reactive with the anti-His tag mAb, thus substantiating that these bands are indeed the recombinant DPPD.In addition, the Western performed under non-reducing conditions (Figure 4, lane NR) clearly shows that DPPD is present as variety of polymeric mole- cules.Interesting no bands of molecules with MW smaller than 15 kDa was seen under these conditions.These results indicate that strong polymerization of the molecule occurs through its cysteine residues and no monomeric forms are present in the purified preparation.Finally, no bands were seen in the blots probed with control IgG antibody (not shown).
To categorically define that the purified recombinant protein was indeed DPPD, amino acid sequence was performed by mass spectroscopy.The recombinant protein was run on a 4% -20% gradient gel and the single bands seen in the Coomassie Blue stained gel were excised and in gel tryptic digested followed by LC-MS/MS analysis for identification.rDPPD protein sequence was identified in bands with high confidence (XCORR: 4.7594) through a single tryptic digested peptide present within the amino acid sequence of DPPD.The sequence had 100% identity with the deduced amino sequence of the DPPD gene at the residues 19 -35 (Figure 5).
Taken together, these results clearly point that rDPPD can be over-expressed in Rosetta 2 or BL21 host cells transformed with pET-14b containing the DPPD gene with optimized sequence for E. coli.

Biological Validation of Purified rDPPD
One important requirement to validate a microbial molecule as a diagnostic tool or as a vaccine candidate is that the immune response of a host sensitized or infected with the microbe donor of that protein recognizes that molecule.Our former studies using rDPPD expressed as fusion protein have confirmed that guinea pigs and humans sensitized or infected with M. tuberculosis develop specific T cell response to DPPD.Therefore, it became important to verify that the rDPPD molecule expressed and purified in the present work would also be recognized by T cells from M. tuberculosis sensitized individuals.To test this requirement, PBMC were obtained from six healthy volunteers who were known to be responders to purified protein derivative (PPD) of tuberculin, the antigen that is used in the human skin test for the diagnosis of tuberculosis.Recognition of rDPPD was tested by antigen-induced production of IFN-γ by the donors' PBMC.As it can be seen in Figure 6, rDPPD stimulated the PBMC of all six tuberculin sensitive donors to produce high levels of IFN-γ.No response was observed with PBMC obtained from three PPD nonresponder controls (not shown).Due to the limited number of individuals analyzed these results cannot at this point be correlated with clinical validation of rDPPD as a tool to be used in the diagnosis of tuberculosis.However, because rDPPD isreadily recognized by PBMC of tuberculosis sensitized individuals the results clearly indicate that the newly expressed and purified recombinant molecule is biologically active.

DISCUSSION
I early studies we showed that a recombinant fusion molecule composed of rRa12-DPPD elicited delayed type hypersensitivity in humans comparable to that elicited by standard PPD antigen [3].Unfortunately, this fusion protein contains a 14 kDa polypeptide from M. tuberculosis which, in contrast to DPPD, is broadly dis- tributed among the Mycobacterium genus.If on one hand we were successful to generate a purified recombinant molecule, on the other hand it introduced an undesired property to the rDPPD i.e., a fusion protein that is no longer specific for M. tuberculosis.However, this "homemade" fusion protein expression system [15] was the only procedure that we found to be successful to produce rDPPD.A variety of commercially available systems that uses E. coli as host cells consistently failed to express rDPPD.These included various vectors such pET, pQ30 (Qiagen), pThioHis (Invitrogen), and pGEX-2T (GE Healthcare, Piscataway, New Jersey).Host E. coli cells tested included BL21(DE3), BL21(DE3)pLysS, and Rosetta 2(DE)pLysS (Novagen).In addition, rDPPD could not be expressed using other host cells such as the yeast Pichia pastoris or Streptococcus lividans (unpublished observations).
However, as mentioned before, low levels of rDPPD could be expressed and purified by using the Mycobacterium expression vector pSMT3 and Mycobacterium smegmatis as host cell [10].This observation prompted us to hypothesize that preferential codon usage could have been the condition that facilitated the synthesis of a M. tuberculosis protein in a Mycobacterium host cell.Here, we tested this possibility using a standard expression vector (pET14b) and standard E. coli host cells (Rosetta 2(DE3)pLysS or BL21(DE3)pLysS), which are a well known systems designed for high levels of protein production.The technological advent of achieving robust and automated synthesis of large DNA molecules permitted us to produce a DPPD gene containing a sequence designed to match the preferential codon usage of E. coli instead of that of Mycobacterium.The E. coli optimized DPPD gene was synthesized by BlueHeron Biotechnology, Seattle WA.Blue Heron uses proprietary codon utilization databank, geneassembly instruments, and other technologies to accurately and rapidly engineer and assemble oligonucleotides into full-length constructs.Using BL21 or Rosetta 2 E. coli transformed with pET14b containing the optimized DPPD gene we were able to successfully express and purify rDPPD in yields never before achieved.Approximately 5 -10 mg of purified recombinant was consistently obtained per liter of bacterial culture.
The deduced MW of mature DPPD molecule (9,022.9Da) does not agree with the molecular mass of ~12 kDa of the major band of the purified molecule estimated by SDS-PAGE.However, this same pattern of migration was observed with the native molecule when we first discovered DPPD [4].This phenomenon of abnormal gel migration has been described for several proteins that have an unusually high proline content and a low isoelectric point caused by high contents of aspartic and glutamic acid residues [12][13][14].In general, the classical SDS-PAGE method often overestimates molecular weights of molecules if the proline content is >10% in a given protein [11].The general consensus is that this altered migration pattern is caused by the amino acid composition, and not post-translational modification [14].Coincidentally DPPD fits all these predictive parameters.Out of the 84 amino acids that compose the mature form of the molecule, 16 (19%) are prolines, 7 (8.3%) are aspartic acid and 3 (3.6%)are glutamic acid.Moreover, the theoretical pI of the mature DPPD protein is 4.2, which was obtained using the ExPASy Proteomics algorithm of The Swiss Institute of Informatics (http://au.expasy.org/tools/protparam.html).Therefore, these unique molecular characteristics of the mature rDPPD are in consonance with its pattern of migration observed in the PAGE analysis.Importantly, the 12 kDa band seen in the PAGE was confirmed to be DPPD by mass spectroscopy.
Also interesting was the confirmation that the purified recombinant DPPD was readily recognized by the PBMC obtained from tuberculosis sensitized healthy individuals.These observations point to the potential use of this single and defined molecule as a potential reagent for the tuberculin skin test.Alternatively, rDPPD can be also an interesting molecule to be tested as component of the recently developed whole blood IFN-γ release assay for the diagnosis of tuberculosis.As mentioned before, this molecule is unique to members of the M. tuberculosis complex only, therefore an attractive specific antigen for test development.
Finally, it is important to emphasize that the procedure described in this manuscript to achieve workable concentrations of rDPPD per liter of bacterial culture, uses conventional standard operating procedures for production of recombinant proteins.Therefore, if rDPPD, in future experiments, proves to be useful for the diagnosis of tuberculosis, no hurdles should exist to upscalethe production of this molecule under GLP or GMP conditions for clinical use.

ABBREVIATIONS
DPPD, a difficult to express protein of Mycobacterium tuberculosis.

CONFLICT OF INTEREST
None of the authors has any financial conflict of interest.

Figure 2
depicts the DPPD optimized gene sequence compared to that of M. tuberculosis.

Figure 1 .
Figure 1.Deduced amino acid sequence of the protein coded by Mycobacterium tuberculosis Rv0061.The deduced amino acid sequence of the 112 residues of the protein, reveals that the full length gene codes for a typical secretory protein, which includes a signal peptide sequence (illustrated in red letters), the signal peptidase recognition sequence ASA (illustrated in green, bold, italic letters) and the mature sequence of protein (DPPD), which is depicted in black bold letters.

Figure 2 .
Figure 2. DPPD DNA sequence with E. coli optimized codon usage.Codon optimization for protein expression was done using the Blue Heron Expression Optimization tool which minimizes secondary mRNA structure to reduce translational impediments.Optimized DPPD sequence is displayed in bold (upper sequence).Note that an Nde I and a Bam HI restriction enzyme sequences (underline sequences) flanking the 5' and 3' ends respectively were incorporated in the optimized sequence.For comparison purposes the Rv0061 gene sequence (coding the mature form of the protein only) is also shown (lower sequence).

Figure 3 .
Figure 3. Expression of recombinant DPPD in E. coli.Recombinant DPPD was expressed in E. coli with six His-tag amino terminal residues and the protein was purified by affinity chromatography using Ni-NTA agarose matrix.Coomassie blue stained SDS/4-20% polyacrylamide gradient gel of 5μg of purified recombinant DPPD (lane 1).MWM, molecular weight markers; numbers on left are the MW of the markers in kDa.

Figure 4 .
Figure 4.Western blot analyses of recombinant DPPD expressed by E. coli.Purified rDPPD was initially subjected to SDS-PAGE (gradient gel 4 -20) performed under reducing (R) and non-reducing (NR) conditions.Protein was transferred to nitrocellulose membrane and the presence of the recombinant molecule was identified using a mouse anti-His-tag (C-terminus) monoclonal antibody.Reactivity was detected with peroxidase labeled goat anti mouse immunoglobulin and developed using ECL chemo luminescent reagent (Western blot detection system, Amershan Biologicals, Upsala Swe-den).Numbers on the left side indicate the molecular weights of the markers.

Figure 5 .
Figure 5. DPPD peptide sequence identified by mass spectroscopy in purified recombinant protein.The peptide sequence and positioning within the peptide donor protein (DPPD) is highlighted in red/bold/underline.The trypsin cleavage sites (right side of the amino acids R and K) that generated the peptide are illustrated above the molecule.

Figure 6 .
Figure 6.Recognition of purified recombinant DPPD by human PBMC.IFN-γ production by PBMC from PPD positive healthy donors following stimulation for 72 h with medium, rDPPD (10 μg/ml) or PPD (10 μg/ml) was measured by sandwich ELISA in the culture supernatants.Bars represent the SD of the means calculated from the results of triplicate cultures.