Insilico structural analysis of parasporin 2 protein sequences of non-toxic bacillus thuringiensis

The unusual and remarkable property of parasporin 2 of non-insecticidal Bacillus thuringiensis is specifically recognizing and selectively targeting human leukemic cell lines. The 37-kDa inactive nascent protein is proteolytically cleaved to the 30-kDa active form that loses both the N-terminal and the C-terminal segments. Accumulated cytological and biochemical observations on parasporin-2 imply that the protein is a pore-forming toxin. To confirm the hypothesis, insilico analysis was performed using homology modeling. The resulting model of parasporin 2 protein is unusually elongated and mainly comprises long β-strands aligned with its long axis. It is similar to aerolysin-type β-pore-forming toxins, which strongly reinforce the pore-forming hypothesis. The molecule can be divided into three domains. Domain 1, comprising a small β-sheet sandwiched by short α-helices, is probably the target-binding module. Two other domains are both β-sandwiches and thought to be involved in oligomerization and pore formation. Domain 2 has a putative channel-forming β-hairpin characteristic of aerolysin-type toxins. The surface of the protein has an extensive track of exposed side chains of serine and threonine residues. The track might orient the molecule on the cell membrane when domain 1 binds to the target until oligomerization and pore formation are initiated. The β-hairpin has such a tight structure that it seems unlikely to reform as postulated in a recent model of pore formation developed for aerolysin-type toxins. Parasporin 2 (Accession no: BAD35170) protein sequence analysis indicated two different domains namely, aerolysin toxin and clostridium toxin domain based on different database searches (CDD and Pfam). It showed a close similarity with the available PDB template (PDB id: 2ZTB) of parasporin which has cytocidal activity against MOLT-4, HL60 and Jurkat cell lines. Based on the PSI Blast analysis, 3D structures of the domains were predicted by using Swiss model server. Accuracy of the prediction of 3D structure of different domains of parasporin protein was further validated by Ramachandran plot and PROCHECK (G-value). The structure is dominated by β-strands (67%, S1-12), most of which are remarkably extensive, running all or most of the longer axis of the molecule. This study helped to elucidate the 3D structure of parasporin 2 (Acc. No. BAD35170) which might enable to probe further its specific mechanism of action. Though the similarity is observed in the domain architecture, there is variation in the regions of the domains even among the same group of parasporin 2. Docking of this model structure and experimental structure with specific receptors of the cancer cells will facilitate to explore mechanism of parasporin 2 action and also provide information about its evolutionary relationship with toxic Cry proteins.


BACKGROUND
Since the incidence of new cancer patients is increasing annually due to altered food habits and life styles, efforts are being made worldwide to identify new molecular markers and therapeutic agents for the purpose of diagnosis and treatment of the same.The existing chemotherapeutics not only affect tumor cells but also normal cells.Hence, search for compounds which can specifically target the cancer cells will overcome the existing problem [1].
At present, four genealogically different parasporins are identified as parasporin-1 to parasporin-4 that has the ability to specifically act against cancer cells [2].This area of research is still under exploration, since a very few of the literature is available related to parasporin structure and mechanism of action.Parasporin 2 is known to interact with GP-I protein and the cell death induced by parasporin-2 is non-apoptotic, although the apoptotic process occurs when the cell damage proceeded slowly.
Parasporin-2 increases the plasma membrane permeability of the target cells as it binds to a detergent-resistant membrane, the so-called "lipid raft" in a plasma membrane, and then forms the SDS-resistant oligomer embedded in the membrane.This toxin binds GPI-proteins in lipid raft, and then seems to form the oligomer that can permeabilize the plasma membrane.This is followed by the formation of oligomers (> 200 kDa) of PS2Aa1 in plasma membranes, leading to pore formation and cell lysis.The oligomerization occurs in the presence of membrane proteins, lipid bilayer and cholesterols [3].
Only two experimentally determined structures (PDB: 2ZTB, 2D42) are available till date as confirmed in the PDB.Hence, alternative strategies are being applied to develop theoretical models of protein structure of parasporin 2 (Accession no: BAD35170) when X-ray diffraction or NMR structures are not available, aiming to bridge the structure-knowledge gap.Higher resolution models, derived from relationships with better than about 30% sequence identity or refined from lower resolution starting models, are very helpful in assigning detailed aspects of molecular function [4].

Domain identification
The complete sequence of parasporin protein (Bacillus thuringiensis) available from NCBI till date revealed the aerolysin toxin and clostridium toxin domain (Table 1).Domains present one of the most useful levels at which to understand protein function and the domain family-based analysis has had a profound impact on the study of individual proteins [5].

Template Identification by Fold-Recognition Servers
Two approaches are employed to identify the potential templates by submitting a multiple sequence alignment (MSA) of all the parasporin sequences and submitting each of the parasporin sequences individually.Thus, in order to identify a template structure for modeling of parasporin protein sequences, we used the comparative modeling approach (match of secondary structure elements, compatibility of residue-residue contacts, etc.).In the former, MSA for the entire sequence from N-to C-terminus (Figure 1) was submitted to the FUGUE server; the template 2ZTB was identified [10] with very high confidence levels (Z-score for the top hit = 31.32;certain).Even the GeneSilico metaserver identified 2ZTB template with reliable confidence levels (3D-Jury score for the top hit = 133; reliable).In view of these, both the servers identified 2ZTB as the top hit (Z-score = 31.3.2;certain and 3DJury score = 133; reliable).
In the second approach, complete sequence of the parasporin was used separately as query to search for homologs in the PDB database using BLAST and PSI-BLAST.The fold-recognition servers, GeneSilico metaserver, FUGUE and SAM-T02 identified 2ZTB as the possible template only for parasporin sequence (Acc.No. BAD35170) with a high level of confidence (Table 2).Despite the scores reported by the individual threading methods were hardly significant, the consensus server Pcons5 [11] assigned a significant score (1.35) to the 2ZTB structure as a potential modeling template as evident in its sequence alignment (Figure 2).

Modeling 3-D Structure and Stereochemical Evaluation of the Predicted Models
The 3-D structure of parasporin sequence at 2.38 A resolution (PDB id: 2ZTB, A chain) is the main template for modeling only one parasporin sequence (Acc.No. BAD35170) with 88% identity, since, all other parasporin sequences had less than 26% identity with its corre- Bit score E-value    sponding template (Figure 3).The final averaged and optimized model passed all the tests implemented in the stereochemistry-evaluating WHATCHECK suite [12][13][14][15] and in the VERIFY3D program, which uses contact potentials to assess whether the modeled amino acid residues occur in the environment typical for globular proteins with hydrophobic core and solvent-exposed surface (Eisenberg et al., 1997).Moreover, the reasonable energies are rarely observed for misfolded structures.Thus, the scores reported for our model by WHATCHECK (Z-score-4.1) and VERIFY3D (average score 0.3, no regions scored lower than 0) suggest that both its three-dimensional fold and the conformation of individual residues are reasonable.The selected model, the value of the objective function, reported as current energy is in the same range as that if the template is aligned with its own sequence.On an average, 99.1% of the residues are found in the allowed region of Ramachandran map, PROCHECK considers the model to be very good if it has 90% of the residues in the most favored region.The inter-atomic distances are within acceptable range.Verify3D score is greater than zero for the entire model (1 to 274 residues).The models were also evaluated using Colorado3D server, which facilitates the change of amino acid window size when calculating the overall score.Two window sizes, 5 and 21, were used to calculate the average Verify3D and ProsaII score per residue for the model [12,13].The scores calculated using these two window sizes were found to be very similar.The template and target models were rendered with the residues color-coded based on ProsaII and Verify3D scores.With ProsaII score-based coloring, most of the residues are green and yellow (i.e., average score) in both the target and template proteins.Z-score of a model is a measure of com-patibility between its sequence and structure, the model Z-score should be comparable to that of the template.With Verify3D score-based coloring, even the template proteins has residues in red color (i.e., bad score) although the number of such residues are more in the targets.

CONCLUSIONS
The initial step in cytocidal action of PS2Aa1 is the specific binding of this cytotoxin to a putative receptor located in the lipid rafts, followed by its oligomerization and pore formation in plasma membrane.Secondary structures observed in the model (Accession No: BAD 35170) are organized virtually in the same way as the experimentally determined parasporin-2 protein (PDb id: 2ZTB).Even though 2ZTB and 2D42 belong to the parasporin type 2 protein sequence, BAD35170 bears greater similarity to 2ZTB in its domain organization.

Databases
The amino acid sequences of the experimentally characterized parasporin sequences (Table 4) were retrieved from the protein sequence database at NCBI http:// www.ncbi.nlm.nih.gov.The 3-D structures of proteins  were obtained from the protein data bank [16].The fold classification of proteins is from the SCOP database [17].

Servers
Protein sequence databases were searched using PSI-BLAST [18] servers at NCBI, PHYRE (successor of 3D-PSSM) [19], SAM-T02 [20] and GeneSilico Metaserver [21] were used for fold-recognition.Multiple sequence alignments were obtained using the CLUSTALW server [22].Verify3D [13] and Colorado3D [23] were used to evaluate the models.All the servers were used with default values for the various parameters, except where mentioned otherwise.

Software and Hardware
SwissPDBviewer [24] was used for visualization and/or rendering.The stereochemical quality of the generated model was assessed using PROCHECK [10].Default values were used for all the parameters, unless specified otherwise.

Template-Target Sequence Alignment
The parasporin sequences were submitted for structure prediction using comparative modeling technique.The preliminary models were obtained using unrefined pairwise alignments reported by PSI-BLAST [25].Energy minimization was carried out using GROMOS96 [26] until all inconsistencies in geometry were rectified and all the short contacts were relieved.The stereochemical and energetic properties of modeling intermediates and of the final model were evaluated using WHATCHECK [27] and VERIFY3D [28].Semi-automated and manual manipulations with protein structures and sequence-structure alignments were conducted using SWISS-PDB VIEWER [24].All the servers provide alignment of the submitted parasporin sequence (target) with the sequence of the potential hits (templates).

Validation of Predicted 3-D Structures
The stereochemical properties of predicted 3-D structures were assessed by PROCHECK and the residue environments by Verify3D and Colorado3D.Regions that are found by these servers as poorly modeled were improved by manual adjustment of alignments and re-modeling.

Table 2 .
Summary of the best template sequence profile that was generated at the end of 3 rd iteration of PSI Blast analysis using the parasporin protein sequence of B.thuringiensis as query.Threshold PSI-BLAST E-value = 0.001.S.

Table 3 .
Structural superposition report of the model generated for BAD35170 with its corresponding template, 2ZTB.

Table 4 .
List of parasporin sequences retrieved from NCBI database.