Development of Split-Protein Systems: From Binary to Ternary System

Tens of thousands of protein-protein interactions (PPIs) have been found in human cells and many of these macromolecular partnerships could deter-mine the cell growth and death. Thus there is a need to develop the methods to catalogue these macromolecules by detecting their interactions, modifications, and cellular locations. It will be helpful for scientists to compare the difference between a diseased cellular state and its normal state and to find the potential therapy treatment to intervene this status. One technology called split-protein reassembly or protein fragment complementation has been developed in the last two decades. This technology makes use of appropriate fragmentation of some protein reporters and the refolding of these reports could be detected by their function to confirm the interaction of interest. This system has been set up in cell-free systems, E. coli, yeast, mammalian cells, plants and live animals. Herein, I present the development in fluorescence-and bioluminescence-based split-protein biosensors in both binary and ternary systems. In addition, some people developed the split-protein system by combining it with chemical inducer of dimerization strategy (CID). This has been applied for identifying the enzyme inhibitors and regulating the activity of protein kinases and phosphatases. With effort from many laboratories from the world, a variety of split-protein systems have been developed for studying the PPI in vitro and in vivo, monitoring the biological process, and controlling the activity of the enzyme of interest.


Introduction
In recent years, conditional split-protein reassembly has emerged as a method for the investigation of a variety of macromolecular interactions [1] [2] [3] [4].
In order to develop successful split-protein systems that can conditionally assemble and report upon the presence or absence of an interaction of interest, certain criteria have to be satisfied. Firstly, the parent protein or enzyme being fragmented to develop these types of sensors should have an easily measurable output, such as fluorescence or luminescence that is not significantly suppressed by other components in a cell or cell lysate. Secondly, the split protein fragments by themselves should not have any activity prior to reassembly. Thirdly, except for some self-assemble systems (such as GFP1-10/GFP11) [5], in the system for detecting protein-protein interactions (PPIs), the interaction between two protein fragments should be negligible (<100 fold). For example, if the attached proteins being interrogated have a binding constant of 10 nM, then the split protein fragments should have a binding constant of >1 µM and ideally be reversible. The first conditional split protein reassembly system was established by Johnsson and Varshavsky in 1994 [6], where ubiquitin was split into two fragments and only regained its folded native structure when fused to two interacting protein domains. Since then many different proteins, mostly enzymes, have been engineered for developing split-protein reassembly systems, including the green fluorescent protein and its derivatives [5] [7] [8], dihydrofolate reductase [9], β-lactamase [10], firefly and other luciferases [11] [12] [13], tobacco etch virus protease [14], thymidine kinase [15], chorismate mutase [16], Cas9 [17], horseradish peroxidase [18], RNA polymerase [19], and aminoacyl tRNA synthetase [20]. Among these proteins, GFP and its variants, β-lacatamase, and firefly luciferase have been utilized as the reporter proteins for developing a wide range of sensors [21]- [33]. Herein, I will at first describe the development of split-protein systems which used fluorescence and bioluminescence as report signals and then focus on its extension to the ternary system for detecting the small molecules and macromolecules as well as controlling the activity of the enzymes of interest, such as protein kinase and phosphatase.

Method
Searched the manuscripts in Scopus with "Split protein reassembly" and "Protein fragment complementary". Obtain about 71 and 13,288 published papers. For the results from "Protein fragment complementary", searched the articles with "fluorescent protein" and "luciferase" respectively to obtain 1132 and 373 manuscripts. Read the title and abstract for these results and finally selected 165 and 80 manuscripts respectively for these two fields. The whole process is briefly described in Figure 1

Split-Protein Systems Based on Fluorescent Protein
Green fluorescent protein (GFP) derived from Aequorea Victoria consists of a β-barrel structure with a central α-helix. After folding to its native structure, three residues (S65-Y66-G67) in the central helix of GFP undergo autocatalysis under oxidative conditions to yield the fluorophore. GFP and its variants have been attached to different proteins and protein-protein interactions are measured by fluorescence resonance energy transfer [34], however, this remains a demanding technique for routine applications [35] [36]. The first split-GFP was developed by dissection of the GFP protein at residues 157 and 158 and each fragment was fused to the heterodimeric leucine zippers, Fos and Jun, respectively [7]. It was shown that this was a conditional reassembly system as the native GFP protein was refolded in vitro and in E. coli only when Fos and Jun were present. This work was extended by Kerppola's group [8], which resulted in the split-YFP system, a yellow fluorescent GFP derivative, for the direct visualization of protein interactions in mammalian cells. Similarly, Furman extended this technique to many other Aequorea Victoria GFP variants, including GFPuv, Cerulean, EGFP, and Venus [33].
Following the discovery by Ghosh et al. many labs have greatly expanded the designs of different split-fluorescent proteins and the applications of this novel report system, where a unique name was given to this technology: bimolecular fluorescence complementation (BiFC). The reassembly of fluorescent proteins in the most of these designs was demonstrated to be irreversible under native conditions, that is when the native structure of fluorescent protein was conditionally reconstituted then this structure was resistant to dissociation. This kinetic effect might be particularly helpful for the detection of low abundance or low affinity complexes in vivo and in vitro [37] [38] but is a major problem with regard to the misinterpretation of PPIs and careful control experiments with known mutations that prevent binding need to be routinely carried out [34] [39] [40]. Moreover, many laboratories are interested in disrupting PPIs with peptides and small molecules [10] [11] [13], which can also be difficult to accomplish and interpret in these systems that are not under thermodynamic control. To address this problem, two labs have achieved some advancement with two specific sys-tems. Tchekanda et.al developed one reversible BiFC based on the engineered D. radiodurans infrared fluorescent protein IFP1.4 (using biliverdin as its chromophore) [41]. They have demonstrated the reversibility of this probe in vitro and applied this system in yeast and mammalian cells. To et al. engineered a reversible green split protein reporter, named uPPO [UnaG-based protein-protein interaction reporter] where after refolding UnaG protein could incorporate bilirubin as the chromophore in mammalian cells [42]. Although both of these two systems require the incorporation of chromophores, they supplied one methodology to design the reversible BiFC system, which has great potential application to study the dynamics of PPIs with low background signals.
In addition to the reversibility issue, there are some other limitations in the applications for BiFC system. First, the long maturation time for the chromophore formation during the refolding process, which caused the unavoidable delay between the PPIs and the signal readout. Second, the nonspecific self-reassembly of the two fragments could increase the background signals. Moreover, the solubility of the split fragments and its fusion protein could also perturb the application of this technology. To solve these problems, Cabantous et al. firstly designed one GFP1-10/GFP11 system [5] based on their well-designed superfolder GFP protein (sfGFP), which has 11 mutations compared with avGFP protein and has high folding stability in the solution [43]. This system has been used for detecting the solubility of proteins of interest by their self-reassembly of these two fragments [44]. Then it was widely used as a sensor for detecting the activity of protease [45] [46] [47], protein kinase and phosphatase [48], and other peptides or cargoes penetrating the cells [49] [50] [51]. Furthermore, Cabantous et al.
modified this system to create a tripartite split GFP system consisting three fragments (GFP1-9, GFP10, and GFP11) [52]. Compared with the previous GFP system, the domains chosen to fusing other protein binding partners (GFP10 and GFP11) have much less perturbance to the proteins of interest and avoid the aggregation problems. More importantly, this system minimizes the background signals from self-assembly. This novel split GFP system has not only been used for developing the sensors for small GTPases and sortase [53] [54] but also found able to promote the crystallization of the protein [55] [56].

Split-Protein Systems Based on Luciferase
On the other hand, following the discovery of split-GFP and its variants, firefly and other luciferases have also been developed as reporters using split-protein systems [11] [12] [13] (Table 1). Luciferases produce a luminescent signal by catalyzing the oxidation reaction of small molecule substrates, which does not need input of light and has very low background signal. Moreover, the dynamic study of protein-protein interaction through split-luciferase systems in vitro and in vivo showed that split-luciferase was a reversal and sensitive reporter at endogenous protein expression levels [11].
Due to the excellent advantages of bioluminescence-based split protein systems, some labs have developed sensors for visualizing dynamic process of the PPIs Split protein reassembly with dihydrofolate reductase [9] Split protein reassembly with β-lactamase [10] [23] Split protein reassembly with tobacco etch virus protease [14] Split protein reassembly with thymidine kinase [15] Split protein reassembly with chorismate mutase [16] Split protein reassembly with Cas9 [17] Split protein reassembly with horseradish peroxidase [18] Split protein reassembly with RNA polymerase [19] Split protein reassembly with aminoacyl tRNA synthetase [20] Split protein reassembly with kinase [89] Split protein reassembly with phosphatase [90] during the cell signaling pathway. Paulmurugan et al. firstly inserted the estrogen receptor (ER) ligand binding domain into Renilla luciferase or firely luciferase, which was used for imaging ligand-induced intramolecular folding in living mice [57]. Following this study, several biosensors have been developed to detect the effect of mutations in human ERα [58], obtain the simultaneous detection of different ligand actions on ER [59] [60], and the dimerization of two types of ER (ERα and ERβ) [61]. In other fields, split-luciferase system has also been used for investigating the Myc protein [ [69], and ErbB2/HER2/neu pathway [70].
In addition to in vivo assays, there has been also a great development for a large number of in vitro cell-free expression systems for measuring a variety of different macromolecular interactions using split luciferase reporter systems which showed the advantages in both ease of detection and sensitivity [24]- [32]. In cell-free methods, the mRNA of split-luciferase fused with a designed protein binding domain of interest are obtained by in vitro transcription, and after purification, the fused split-luciferase fragments are translated from the mRNA in cell-lysates in vitro. The translated split-protein fragments can undergo reassembly to its native and active form, either by PPI in binary systems or by addition of the target for the two fused protein binding domains in ternary systems.

Ternary Split-Protein Sensors: Beyond Binary Interactions
Although binary split-protein systems have been widely used for the study of Advances in Bioscience and Biotechnology PPIs in vitro and in vivo, the ternary split-protein systems with high sensitivity for the targets have been systematically developed over the past decade ( Figure  2). Compared with the binary system which made use of two interacting protein domains fused to the split-reporter protein fragments, in the ternary system, two protein binding domains with limited interaction are fused to the split-GFP or split-luciferase domains. With the addition of the target or analyte of interest, the interaction between the protein binding domains with the analyte molecule brings the two fragments of the reporter protein in proximity, which enables the refolding of the native protein which regains activity for readout of the analyte of interest.

Sensors for Nucleic Acids and Its Modifications
Stains et al creatively designed one novel reporter in which two split-GFP fragments were fused to the designed and natural Cys2-His2 zinc finger DNA binding domains [21]. In the presence of target DNA, the two GFP domains were brought into close proximity and shown to refold and fluoresce only when specific adjacent DNA sites were available for both the zinc finger domains. Later on, this system was extended by replacing the reporter protein with other GFP variants for simultaneous detection of multiple targets [33], β-lactamase [23] or firefly luciferase [24] for improved sensitivity. The ternary system was shown to Figure 2. Split protein reassembly/Protein fragment complementary method in binary system and ternary system. Reporter protein is divided into two fragments which could regain the function of the whole protein after reassembly under some condition. These two fragments are fused to the domains of interest (A and B). In binary system, the interaction of these two domains could bring the two reporter fragments close enough for the protein refolding (bottom left). In the ternary system, these two domains could form complex by binding the third domain (C) in the solution, which enables the reassembly of the reporter protein.
be a general approach for detecting DNA modifications by modulation of the protein binding domain [22] [23] [30] [31]. In these systems, the zinc finger domains were replaced by alternate binding domains that target specific nucleic acid modifications. For instance, methyl binding domain (MBD) that binds methylated cytosines and Zif268, a natural zinc finger domain, have been used in the split-protein system for the detection of specific sites of DNA methylation [22] a known epigenetic modification. Furthermore, they also systematically investigated the difference between different MBD family members and their ability to recognize methylated DNA [31].
The cell-free split luciferase system has also been adapted for designing turn-on sensors for UV or oxidation-dependent DNA damage. In this design, oxoguanine glycosylase 1 (OGG1) or the damaged-DNA binding domain 2 (DDB2) was fused to the C-terminal fragment of firefly luciferase (CFluc) while MBD was fused to the N-terminal fragment of firefly luciferase (NFluc) [30]. With conjugation of the protein domain with corresponding specific DNA modification site, the split-firefly luciferase was shown to reassemble. This sensor provided a simple and sensitive approach for the rapid detection of the chemical modification of DNA exposed to different environmental insults.
Compared with the detection of DNA, few strategies have been developed for detection of specific RNA sequence due to the limitation of RNA recognition domains. To overcome this difficulty, three kinds of strategies with split-protein reassembly have been developed [24] [27]. The earliest one was to make use of the protein-target interaction directly, where Pumilio (Pum) RNA binding proteins confer specificity for binding a particular ssRNA sequencing.
To develop general strategy for sequence-specifically assembling for any user-defined ssRNA target, Argonaute (Ago), which has been found able to bind to the 2-nucleotide, 3' overhangs of short double stranded RNA (dsRNA), was utilized for the target binding site. To obtain the binding with Ago domain, dsRNA was prepared by addition of complementary guide oligonucleotides which could target the ssRNA sample in the solution. As a third design, dsDNA hairpins were combined with ssRNA guides which were complementary to the ssRNA target.
In this method, high affinity (Kd ~ low pM) of sequence-specific Zinc finger domains were fused to the slit-luciferase, and the specific sensor could be designed flexibly by replacing the complementary ssRNA sequence for the target ssRNA of interest.

Sensor for Native Proteins
The direct and specific detection of native proteins in complex heterogeneous solution remains a challenge. Then, a general methodology based on the splitluciferase system was developed by combining single-chain antibodies or cellular receptor fragments that target specific native proteins of interest [28]. This method was used for the detection of vascular endothelial growth factor (VEGF), gp120 and human epidermal growth factor receptor-2 (HER2). The requirement for this strategy is that the two binding domains of the split-luciferase system should be able to bind the target protein simultaneously at different sites which should be close enough to allow for the reassembly of the reporter protein. More importantly, as shown in the recognition of HER2, the single-strand antibodies which were translated in vitro system could extend this technique to other application with an entirely antibody-based recognition system.

Sensor for Protein Modification
After translation of proteins from ribosome, there are various post-translational modifications on proteins, which play an important role in the protein folding, location, and its function. There is one post-translational modification of proteins with a polymer, poly (ADP-ribose) or PAR, which is involved in the DNA damage repair process [71]. Furman et al. designed one split-luciferase system for the detecting this post-translational modification by fusing the two split luciferase fragments to the PBZ domains deriving from Aprataxin and PNK-lik factor (APLF) [72], which has been shown strong binding affinity to PAR [73]. This split-protein report has been applied for both monitoring temporal changes of poly(ADP-ribosyl)ation in mammalian cells and detecting the activity of poly (ADP-ribose) glycohydrolase in vitro, which degrades PAR in cells.

Chemically Induced Dimerization Based Split-Luciferase Systems
Since the discovery of rapamycin-induced heterodimerization of FKBP12 (FK506 binding protein) with FRB (FKBP12 rapamycin binding protein) [74], a methodology that is now called chemically inducer of dimerization strategy or CID, has been developed where two non-interacting fusion proteins, with affinity for a ligand or hybrid ligand, are brought together in presence of a ligand to perform a specific function [75]. CID has been proved powerful, and small ligands have been used for controlling cellular localization [76]  inducer of dimerization, Jun-staurosporine was designed based on the fact that Fos/Jun peptides interaction has high specificity and affinity and staurosporine is a pan kinase inhibitor. In the presence of the inducer, Jun-staurosporine, the two split luciferase domains were found able to reassemble and regain the enzyme function. The reversibility of this formation made it possible that this ternary could be disrupted by the addition of small molecule kinase inhibitors, which could bind the protein kinase at the same binding site as staurosporine. Advances in Bioscience and Biotechnology Figure 3. Coiled-coil enabled CID system for profiling inhibitors of protein kinases [26]. Split firefly luciferase fragments were fused to protein kinase and one coiled-coil peptide (Fos). With the addition of Jun-staurosporine, the firefly luciferase could be reassembled with inducer, which could be perturbed by binding the new inhibitors on the ATP-binding site in protein kinase.
The decrease of the luminescence due to the disruption by small molecule was found to be dose-dependent and the observed signal loss followed a characteristic sigmoidal curve, with which an IC50 of the inhibitor could be measured and was found to be similar to those measured by traditional radioactivity based kinase assays.

Chemically Induced Dimerization Based Split-Kinase and Split-Phosphatase Systems
Post-translational modifications control the temporal and location specific activity of most proteins. In these chemical modifications, phosphorylation and dephosporylation play important roles in the regulation of a diversity of cellular events [86]. Over 500 human protein kinases and 147 protein phosphatases are the key cues in both intra-and extracellular studies [87] [88]. Recently CID technology has successfully achieved the reassembly of split-kinase [89] and split-tyrosine phosphatases [90] with FKBP and FRB, where catalytic activity of kinase or phosphatase could be modulated with rapamycin. In addition, Camacho-Soto et al. also confirmed that the rapamycin gated FKBP/FRB heterodimer could be replaced by two plant-hormone based CID systems [90], which provided a pathway to the small molecule activation of specific split-phosphatase or split-kinase without perturbing cellular signaling in mammalian cell systems.

Outlook and Conclusion
With the development of split-protein reassembly, it could be predicted that there will be more biological activities and biomolecular processes which could be monitored in vitro or in vivo. There has been a variety of applications based on split-protein reassembly and the long-term potential of this methodology is exciting and has been achieving more and more attention. People have achieved the ability to detect binary and ternary partnerships and the rapid interrogation of small molecules that perturb them. The expanding repertoire of split-proteins supplies a means for the temporal control over the outcome of thousands of biomolecular processes. This has already enabled the explicit control over the outcome of one specific signal transduction pathway with this technology, such as the protein kinases and phosphatases. Finally, it can be envisioned that the general split-protein ternary system approach may provide new therapeutic and imaging methods with the conditional reassembly of a split toxic protein in the specific cells.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.