Comparative Analysis of Structure and Sequences of Oryza sativa Superoxide Dismutase

One of the major classes of antioxidant enzymes, which protect the cellular and subcellular components against harmful reactive oxygen species (ROS), is superoxide dismutase (SOD). SODs play pivotal role in scavenging highly reactive free oxygen radicals and protecting cells from toxic effects. In Oryza sativa three types of SODs are available based on their metal content viz. Cu-Zn SOD, Mn SOD and Fe SOD. In the present study attempts were made to critically assess the structure and phylogenetic relationship among Oryza sativa SODs. The sequence similarity search using local BLAST shows that Mn SODs and Fe SODs have greater degree of similarity compared with that of Cu-Zn SODs. The multiple alignment reveals that seven amino acids were found to be totally conserved. The secondary structure shows that Mn SODs and Fe SODs have similar helixes, sheets, turns and coils compared with that of Cu-Zn SODs. The comparative analysis also displayed greater resemblance in primary, secondary and tertiary structures of Fe SODs and Mn SODs. Comparison between the structure and sequence analysis reveals that Mn SOD and Fe SOD are found to be closely related whereas Cu-Zn SOD evolves independently.


Introduction
Protein sequence comparison is the most powerful tool in characterizing protein sequences because of the enormous amount of information kept in the protein domain throughout the evolutionary process.For many protein sequences, even evolutionary history can be traced back to 1 -2 billion years ago.Sequence comparison is most effective in homologous protein, which always shares the common active sites or binding domains.Comparative study of protein structures enables the study of functional relationships between proteins and it bears immense importance in homology search and threading methods in structure prediction.Multiple structure alignment of protein is needed in order to group proteins into families, which enables a subsequent analysis of evolutionary issues.SODs (Superoxide dismutase) constitute the first line of defence against reactive oxygen species (ROS).SODs belong to a large and ubiquitous family of metalloenzymes in aerobic organisms [1].The scavenging capacity of superoxide radicals ( 2 O ) is achieved through an upstream enzyme SOD, which catalyses the dismutation of superoxide to hydrogen peroxide (H 2 O 2 ).SODs are omnipresent in all aerobic organisms and in all sub-cellular compartments susceptible to oxidative stress [2].SODs are classified based on the metal cofactor embodied in the active site of the enzyme.A new type of SOD with "Ni" in the active centre (Ni SOD) has been described in Streptomyces [3] recently.


The majority of the research has focused on either medicinal or agricultural applications; however, the origin of rice (Oryza sativa) has become an important model system with immediate practical applications because of its economic and nutritional importance worldwide [4,5].Rice has several advantages as a model plant.It has a relatively small genome (~430 Mb) that has been almost completely sequenced [6].Abiotic stress is the major environmental constraint to rice production in non-irrigated rice areas [7].SODs are metalloenzymes in aerobic organisms that play a crucial role in protecting organisms against ROS in rice [8]).
The present investigation is an attempt to analyze the sequence of Oryza sativa SODs by using computational tools and techniques in order to understand the biological functions and evolutionary relationships among the Oryza sativa SODs.By understanding the sequence, structure and function relationships between SODs in future new proteins having all possible characters of SODs can be designed to produce cultivars tolerant to reactive oxygen species (ROS).

Materials and Methods
In this study bioinformatics tools, software and methods were used for the sequences comparison of Oryza sativa SODs.This includes sequence retrieval, local BLAST database creation and BLAST search, BioEdit, Port-Param tool, MEGA phylogenetic tree construction, ANTH-EPORT-SOPMA, SWISS-MODEL, YASARA, PRO-CHECK-COMP, Ramachandran plot analysis and PTF database.

Sequence Retrieval
In order to analyze protein sequences and structures the amino acids sequences of the protein of interest is most essential.Protein sequences can be searched from a variety of protein primary sequence databases viz.(PIR, MIPS, SWISS PROT, TrEMBL, NRL-3D [9] (Attwood and Parry-Smith, 2002)).SWISS-PROT/TrEMBL database used for retrieving the amino acid sequences of Oryza sativa SODs.SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively by the Swiss Institute for Bioinformatics (SIB) and the European Bioinformatics Institute (EBI).The data in SWISS-PROT are derived from translations of DNA sequences from the EMBL nucleotide sequence database, adapted form the Protein Identification Resource (PIR) collection, extracted from the literature and directly submitted by researchers.TrEMBL, a computer-annotated supplement to SWISS-PROT, accompanies SWISS-PROT.Amino acids sequences of Oryza sativa SODs were retrieved from SWISS-PROT/ TrEMBL available at www.expasy.org.

Sequence/Primary Structure Analysis
The exact nature of the information encoded in primary structure is still unclear.Detailed folding studies have revealed more and more complexities, making it understand that the sequence to structure relation is a very complicated problem.Using sequence analysis techniques, attempt was made to identify similarities between novel query sequence and database sequences, whose structures and functions have been elucidated.The straight forward at high levels of sequence identity, where relationships are distinctly clear, but below 50% identity it becomes increasingly difficult to establish relationships reliably [8].

Local BLAST
Local BLAST has been created by using BioEdit.Local BLAST (Basic Local Alignment Search Tool) search is often used as the most convenient method for detecting homology of a biological sequence to existing character-ized sequences.Local BLAST looks for homology by searching for locally aligned regions of identity and similarity between a query sequence in a local database.Local BLAST database was created by using file containing all the sequences of Oryza sativa SODs in FASTA format.In BioEdit, from the "Accessary application" menu "BLAST" was chosen.Then 'Create local database' was selected.The rest of the things were automatic.The database automatically placed into the database folder for the BioEdit install directory.This local BLAST has been used to find out the sequence similarity between Oryza sativa SODs.The query sequence of every SODs were given to local BLAST for BLAST search against all Oryza sativa SODs that were present in the local BLAST databases.

ProtParam
ProtParam is a tool, which allows the computation of various physical and chemical parameters for a given protein stored in SWISS-PROT or TrEMBL for a user entered sequence.The computed parameters include the molecular weight, theoretical isoelectric point (pI), amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY).The protein can be specified as a SWISS-PROT of TrE-MBL accession number or ID or in the form of raw sequence.If the accession number of a SWISS-PROT or TrEMBL entry is provided, then we prompt with an intermediary page that allows selecting the portion of the sequence on which analysis is to be performed.The choice includes a selection of mature chains or peptides and domains from the SWISS-PROT/TrEMBL table, as well as the possibility to enter start and position in two boxes.By default the complete sequence will be analyzed.PortParam tool has been used for analyzing physiochemical properties of Oryza sativa SODs.The amino acids sequence of Oryza sativa SODs in FASTA format were given to ProtParam tool available at www.expasy.org.The physiochemical properties of Oryza sativa SODs were given.

BioEdit
BioEdit is a biological sequence editor that runs in Windows and is intended to provide basic functions for protein and nucleic sequence editing, alignment, manipulation and analysis.Hydrophobic amino acids tend to occur in the interior of globular proteins, while at the surface of a protein one will preferentially find hydrophilic residues.The hydrophobicity scale or related scale are frequently used for the prediction of antigenic epitopes.Mean hydrophibicity profiles are generated using the general method of Kyte-Doolitte.Kyte-Doolitte compiled a set of "hydropathy scores" for the 20 amino acids based upon compilation of experimental data from the literature.A window of defined size is moved across a sequence, the hydropathy scores are summed along the window, and the average is taken for each position in the sequence.The mean hydrophobicity value is plotted for the middle residue of the window.Hydrophobic moment profiles plot the hydrophobic moment of segments of defined length along the sequence.For example, if the window size is 21 residues, the plotted value at a residue is the hydrophobic moment of the window of 10 residues on either side of the current residue.Hydrophobic moment is calculated according to Eisenberg method [10], where "mH" is the hydrophobic moment, "Hn" is the hydrophobicity score of residue "H" at position n, d = 100 degrees, "n" is position within the segment, and each hydrophobic moment is summed over a segment of the same defined window length.BioEdit has been used to generate the hydrophobicity plot of Oryza sativa SODs.The file containing Oryza sativa SODs were opened in BioEdit alignment window.In the sequence analysis menu, protein sequence, Kyte-Doolitte hydrophobicity plot were chosen.Subsequently the hydrophobicity plot for Oryza sativa SODs was obtained.

T-COFFEE Tree-based Consistency Objective Function for Alignment Evaluation (T-COFFEE
) is a new progressive method for sequence alignment.Multiple alignments are essential pre-requisites for further analyses of protein families such as homology modeling or phylogenic reconstruction, or simply used to illustrate conserved and variable sites within a family.Those alignments may be further used to derive profiles or hidden Markov models [11] that can be used to scour databases for distantly related members of the family.
T-COFFEE combine signals from heterogeneous sources into a unique consensus multiple sequence alignment [12].T-COFFEE has two main features.Firstly, it provides a simple and flexible means of generating multiple alignments using heterogeneous data sources that are provided to T-COFFEE via a library of pair-wise alignments.Secondly T-COFFEE is the optimization method, which is used to find the multiple alignments that best fit the pair-wise alignments in the input library.

MEGA
Molecular Evolutionary Genetic Analysis (MEGA) creating a multiple sequence alignments using Clustal W or Clustal X.The main use of this software is to estimate evolutionary distances and to build the phylogenetic tree of multiple protein and nucleotide sequences.MEGA has been used for the phylogenetic analysis of Oryza sativa SODs.The file containing Oryza sativa SODs amino acid sequences in FASTA were opened in MEGA window and were converted to MEGA file.Then by multiple sequence alignment MEGA alignments file was created.By MEGA alignment file the phylogenetic tree of Oryza sativa SODs was constructed.

Antheprot
A graphic programme was developed to calculate the secondary structure content of proteins from their circular dichroism spectrum.All information concerning analysis and results are given on a single screen.The percentages of secondary structure and statistical parameters are provided.The secondary structure prediction called "SOPMA" which means Self Optimized Prediction from Multiple Alignment in ANTHEPROT.SOPMA is an improvement of SOPM method.These methods are based on the homologue method.The improvement takes place in the fact that SOPMA takes into account information from an alignment of sequences belonging to the same family.If there are no homologous sequences the SOPMA prediction is the SOPM one.The first step of the SOPM is to build sub-databases of protein sequences and their known secondary structures drawn from "DATABASE.DSSP" by 1) making binary comparisons of all protein sequences and 2) taking into account the prediction of structural classes of proteins.The second step is to submit each protein of the subdatabase to a secondary structure prediction using a predictive algorithm based on sequence similarity.The third step is to iteratively determine the predictive parameters that optimize the prediction quality on the whole subdatabase.The last step is to apply the final parameters to the query sequence.When a sequence is submitted to SOPMA from within ANTHEPROT, it will automatically provides to NPS@ web server (http://pbil.ibcp.fr/NPSA) through the Internet.The SOPMA method is able to also predict the turn state but accuracies are given only for four states (Helix, Sheet Coil and Turn).The Oryza sativa SODs sequences were given to ANTHE-PROT-SOPMA method for secondary structure predicttion and for analysis.The secondary structure of Oryza sativa SODs were displayed in ANTHEPROT graphic viewer.The secondary structure content statistics of Oryza sativa SODs were taken by selecting "Details menu" in the graphic viewer.

Swiss-Model
In order to analyze the functional properties of the protein structure of the protein of our interest was felt essentially required.By submitting the sequence to the servers one can get the structure of the proteins through email.SWISS-MODEL used to predict the structure of Oryza sativa SODs available at http://www.expasy.org/swissmod/swiss-model.html.SWISS-MODEL is a server for automated comparative modeling of the three dimensional (3D) structures.SWISS-MODEL provides several levels of user interacttion through World Wide Web interface: in the "first approach mode" only an amino acid sequence of a protein is submitted to build a 3D model.Template selection, alignment and model building are done completely automated by the server.In the "alignment mode" the modeling process is based on a user defined target template alignment.Complex modeling tasks can be handled with the "project mode" using Deep View (SWISS-pdb Viewer), an integrated sequence to structure workbench.All the models are sent back via email with detailed modeling report.

YASARA
The structure given by the SWISS-MODEL was visualized using visualizing tools and was analyzed.YASARA used to visualize tertiary structures of Oryza sativa SODs.YASARA is a molecular graphics, molecular modeling and molecular simulation programme.With an intuitive user interface, photorealistic graphics and support for affordable shutter glasses, utostereoscopic displays and input devices.YASARA has been used to visualize tertiary structure, superimposition of SODs and to find the metal ion present in the active site of every Oryza sativa SODs isoform.

PROCHECK-COMP
PROCHECK-COMP is meant for comparing the residue by residue geometry of a set of closely related structures, such as separate members of a family or models of the same structure saved during different stages of refinement.It outputs a number of PostScript files showing the comparisons, including residue by residue Ramachandran plots and comparison of the different secondary structure elements in each PDB file.This programme is used for protein structure comparison.It compares residues by residue geometry of closely related proteins structures.It is used to compare closely related structures such as, membrane of a family of proteins, models of a structure saved at different stages of refinement and homology model and structure.This programme is now available with PROCHECK programme.The PDB file was uploaded to http://www.jcsg.org/scripts/prod/validation/sv3.cgiserver available at JCSG database.Then the results were produced in postscript file, and subsequently seen by using GS viewer.

The Ramachandran Plot
Ramachandran plot developed by Gopalasamudram Narayana Ramachandran [13] is a way to visualize dihedral angles φ against ψ of amino acids residues in protein structure.It shows the possible conformations of φ and ψ angles for a polypeptide.In a polypeptide the main chain N-C alpha and C alpha-C bonds relatively are free to rotate.These rotations are represented by the torsion angles phi and psi, respectively.In the diagram the white areas correspond to conformations where atoms in the polypeptide come closer the sum of their van der Waals radii.The red regions correspond to conformations where there are no steric clashes, i.e. these are the allowed regions, namely the alpha-helical and beta-sheet conformations.The yellow areas show the allowed regions if slightly shorter van der Waals radii are used in the calculations, i.e., the atoms are allowed to come a little closer together.This brings out an additional region, which corresponds to the left-handed alpha helix.Disallowed regions generally involve steric hindrance between the side chain C-beta methylene group and main chain atoms.Glycine has no side chain and therefore can adopt phi and psi angles in all four quadrants of the Ramachandran plot.

Chimera
Chimera software was used to compare the 3D structure of macromolecules by superimposing one structure with another.All structures of Oryza sativa SODs were opened.Then tools, structure comparison were chosen.Finally the aligned sequences of SODs structure were seen in the chime alignment window.

PTF Database
PTF is an automated protein function predicting server, which was available at http://dragon.bio.purdue.edu/pfp /pfp.html.The sequence submitted was queried with an interactive PSI-BLAST against UniProt.These results were cross-referenced to the Gene ontology annotation file.Then subsequently results were listed as the top 10 most probable Gene ontology annotations in the biological process, molecular function and cellular component categories.Oryza sativa SODs sequences were given to PTF server, and the cellular location of SODs and functions were got via email.

Results and Discussion
The process of comparison of Oryza sativa SODs (including) sequence retrievals, Local BLAST search, physiochemical property analysis, multiple sequence alignment, phylogenetic analysis, secondary structure and tertiary structure prediction and analysis, cellular locations and function prediction) showed substantial sequence similarities.From the investigation results of local BLAST search of Oryza sativa Mn SODs have an average of 63.5% similarity to one another.This is in conformity with the findings of Bowler et al., [2].According to Bowler plant Mn SODs have been reported to have about 65% sequence similarity to each other.With the detailed study, the distribution of SODs both at the sub-cellular level and at the phylogenetic level it was observed that only in plants all three different types of SODs co-exist.Comparison to deduce amino acid sequences from the three types of SODs suggests that Mn SODs and Fe SODs are more efficient.SODs and these enzymes most probably have arisen from the same ancestral enzymes, whereas Cu-Zn SODs have evolved separately in eukaryotes.This view has also been found to be corroborative to the findings of Smith and Doolitte.
Multiple sequence alignment of Oryza sativa SODs reveals that seven amino acids (N-150, G-165, I-177, G-200, G-202, L-206 and L-236) were totally conserve, whereas the multiple sequence alignments between Mn SODs and Fe SODs showed greater number of conserved amino acids residues.This is in conformity with the finding of Sch and Kardinahl [14], 2003.Multiple sequence alignments could not differentiate decisively between Mn SODs and Fe SODs, especially as the pattern and type of metal binding residues are absolutely identical.
The phylogenetic tree (Figure 1) shows that Cu-Zn SODs evolved independently in Oryza sativa SODs.Short distance between branch points of the Mn SODs and Fe SODs suggests a common phylogenetic origin and likely frequent horizontal gene transfer during early evolution.Within the major SODs domain, the Cu-Zn SODs are descendants from a common root.This is in conformity with the findings of Sch and Kardinahl [14] in Arabidopsis thaliana genome analysis.The phylogenetic tree pattern is supported by the hypothesis proposed by Martin and Fridovich, 1981.The hypothesis is that the Fe SODs are of great antiquity and that the Cu-Zn SODs evolved independently of Mn SODs and Fe SODs.Cu-Zn SODs originated in the eukaryotes and the eukaryotes gene was transferred into the prokaryotes [15].
The results show that the Cu-Zn SODs and Fe SODs have similar physiochemical properties like amino acid contents, theoretical isoelectric point, number of positively charged residues and number of sulphur atoms etc.The physiochemical properties of the Photobacter leiognuthi Cu-Zn SOD were very similar to those of the comparable to CU-Zn SOD, Fe SOD enzymes from other sources, especially those of teleost fish [15].Doolitte-Kyte hydrophobicity plot of Oryza sativa SODs (Figure 2) displayed its hydrophobic character, which may be useful in predicting membrane-spanning domains, potential antigenic sites and regions that are likely exposed on the protein's surface.The Mn SODs and Fe SODs have a greater number of hydrophilic residues than Cu-Zn SODs.
Tertiary structure of Oryza sativa Mn SODs and Fe SODs had higher similarities.Both SODs contain an alpha/beta fold, which differs from the greek key bêtebarrel of Cu-Zn SODs.This might be due to the greater sequence similarities between Mn SODs and Fe SODs.Tertiary structure of protein depends upon the primary structure of a protein.So Mn SODs shows greater similarity.This is in conformity with the finding of Stallings et al., [16], which showed high similarity among three tertiary structures of Mn SODs and Fe SODs.
YASARA shows that Mn SODs and Fe SODs are typically observed to be homodimers and homotetramers.Each 200-residues monomer is bound to be metal ion.The active sites of SODs are specific for their respective metal ions and for the superoxide anion.Those exhibit a conserved structure that consists of a group of metalbinding residues enclosed by shell of residues.Although both enzymes have the ability to bind either Mn or Fe, the corresponding metal ion principally governs the native SOD enzyme activity.
From the PTF database it is understood that Oryza sativa Mn SODs and Fe SODs were present in the mitochondria, whereas CU-Zn SODs were present in the apoplast.The molecular function of Cu-Zn SODs also differs from MnSODs and FeSODs.This reflects that specialization of functions among the SODs may be due to the influence of cellular or tissue localization of the enzyme.This observation is supported by Alscher et al., [17].
The Ramachandran plot of Oryza sativa SODs dis-  SODs were found to be similar i.e., superoxide dismutase activity.Function of every protein is determined by its tertiary structure.The superimposing of structure by using Chimera shows that Mn SOD and Fe SOD have highly similar structures, whereas the structure of Cu-Zn SODs was found to be totally different from other two SODs.The functions of SODs also depend on the metal atoms present in the active site.In Mn SODs and Fe SODs only single metal is present.Whereas in Cu-Zn SODs "Cu" and "Zn" atoms were present so the active sites act independently.The function of Cu-Zn SODs is also found to be differing from other two SODs.
The comparative analysis of Oryza sativa SODs shows great similarity in primary, secondary and tertiary structures of Fe SODs and Mn SODs.Comparison of sequence and structure of Oryza sativa SODs reveals that Mn SOD and Fe SOD groups are closely related, whereas the Cu-Zn SOD enzymes apparently has evolved independently.Mn SODs and Fe SODs have similar active site.By understanding the sequence, structure and function relationship between SODs in future someone can design new proteins having all possible characters of SODs to produce rice cultivars tolerant to reactive oxy-gen intermediates species.

Table 1 .
Sequence similarity between Oryza sativa SODs.(a) Local BLAST result for P28756 Cu-Zn SOD; (b) Local BLAST result for P28757 Cu-Zn SOD; (c) Local BLAST result for Q4TUB3 Cu-Zn SOD; (d) Local BLAST result for Q76MX3 Cu-Zn SOD; (e) Local BLAST result for Q43008 Mn SOD; (f) Local BLAST result for Q43121 Mn SOD; (g) Local BLAST result for Q43803 Mn SOD; (h) Local BLAST result for Q7GCN0 Mn SOD; (i) Local BLAST result for Q4VQ67 Fe SOD; (j) Local BLAST result for Q52WX4 Fe SOD; (k) Local BLAST result for Q5VSB7 Fe SOD; (l) Local BLAST result for Q9ZWM8 Fe SOD.

Figure 1 .
Figure 1.Phylogenetic tree of Oryza sativa SODs in rectangular shape obtained by using MEGA.The tree shows the calculated evolutionary relationships of known Oryza sativa SODs.The length of horizontal lines connecting on sequence to another is proportional to the estimated genetic distance between the sequences.

Figure 2 .
Figure 2. Kyte-Doolitte hydrophobicity plot for Oryza sativa SODs obtained by BioEdit.The plot has amino acid sequence of Oryza sativa SODs on its X-axis and degree of hydrophobicity on its Y-axis.played that the Mn SODs (Figure 3(b)) and Fe SODs (Figure 3(c)) had a greater number of residues in favoured and allowed regions, whereas Cu-Zn SODs (Figure 3(a)) have a large number of residues in disallowed residues.The molecular function of Mn SODs and Fe

Figure 3 .
Figure 3. (a) Ramachandran plot of Oryza sativa Cu-Zn SODs.The Ramachandran plots of Oryza sativa SODs shows the amino acid residues present in the most favoured, generously allowed region, allowed and disallowed region of Ramachandran plot; (b) Ramachandran plot of Oryza sativa Mn SODs.The Ramachandran plots of Oryza sativa SODs shows the amino acid residues present in the most favoured, generously allowed region, allowed and disallowed region of Ramachandran plot; (c)-Ramachandran plot of Oryza sativa Fe SODs.The Ramachandran plots of Oryza sativa SODs shows the amino acid residues present in the most favoured, generously allowed region, allowed and disallowed region of Ramachandran plot. a)