Teaching Botany Using Bioinformatics Tools

Two laboratory activities are designed to reinforce several important concepts in General Botany course, which is a required course for biology majors at Savannah State University (SSU). The first activity requires students to study the relationship between protein structure and function through observing the 3D structure of Rubisco (ribulose-1,5-biphosphate carboxylase and oxy-genase)—the enzyme that catalyzes the first step of the Calvin cycle for photosynthesis. This activity also helps students understand the mechanism of enzymatic action through examining the interaction of Rubisco with its cofactor, substrate, competitive inhibitor, and product. The second activity is designed to help students grasp the concept of plant evolution and phylogeny through analyzing the genetic sequences of Rubisco collected from represent-ative species and determining the evolutionary relationships of these species using bioinformatics tools. Through these two laboratory activities, several important topics are linked together, with Rubisco as a common theme, so that students would develop a holistic and coherent view of plant sciences. Furthermore, students would also gain several important bioinformatics skills that they could use and apply in their future studies and careers.

course at SSU. The course covers a broad range of topics of plant sciences at molecular, cellular, and organismic levels; it is accompanied by a laboratory section, which is linked to the lecture topics. In the past, a standard lab manual has been used to conduct the labs by the instructor; they were mostly stand-alone labs and each lab was designed to be completed in 1-hour and 50-minute. These labs tended to be isolated from each other and there was no apparent connection between them.
To foster the inter-connectedness among the topics in biology education and develop students' bioinformatics skills, we designed two lab activities using Rubisco as a common theme. Rubisco is an essential enzyme for the Calvin cycle of photosynthesis, catalyzing the first step, the rate-limiting step, of the cycle. It converts atmospheric carbon dioxide (CO 2 ) into organic compounds. As one of the most important and abundant enzymes on the earth, Rubisco has been thoroughly studied and characterized. The enzyme is composed of 16 polypeptide subunits (or chains), including eight copies of a large chain and eight copies of a small chain. The large chain is coded by the gene (rbcL) in the chloroplast DNA, and the small chain is coded by the gene (rbcS) found in the nuclear DNA (Portis & Parry, 2007).
Rubisco needs two substrates: CO 2 and ribulose 1,5-biphospahte or RuBP (a five-carbon compound). Once the two substrates are gathered in the specific location of the Rubisco's active site, carbon dioxide is linked to RuBP to form two molecules of phosphoglycerate (3PG or PGA), converting inorganic carbon dioxide into organic compounds in the process (Andersson, 2008).
The structure of Rubisco has mostly been determined by X-ray crystallography technique; some of the 3D images are stored in the protein structure databases such as the PDB (Protein Databank) and MMDB (Molecular Modeling Database) and they can be visualized with Cn3D-the software that displays the structure of a biomolecule (Wang et al., 2000). To take advantage of these resources in the databases, we designed a laboratory exercise that would allow students to examine the structure of Rubisco, including its primary, secondary, tertiary, and quaternary structure, and observe the active site of Rubisco and the physical interactions between Rubisco with its cofactor, substrate, competitive inhibitor, and product.
Rubisco is an ancient enzyme that has evolved over two billion years and is found in most of the photosynthetic organisms, including cyanobacteria, algae, and plants (Chase et al., 1993). Therefore, the evolutionary history of the photosynthetic organisms has been recorded in the genetic sequences of Rubisco. As many plant genomes (including their nuclear and chloroplast genomes) have been sequenced, the genetic data of Rubisco are available for phylogenetic and evolutionary analysis of plants. Therefore, we designed another laboratory activity that would require students to retrieve the protein sequences of Rubisco in photosynthetic organisms from the databases and construct a phylogenetic tree of those species.

Activity 1-Visualization of the Structure of Rubisco with Cn3D
Students have been previously exposed to the concept of protein structure and function in their freshman biology course (Principles of Biology); they have also gained the basic understanding of enzymes and their catalytic roles in biochemical reactions in the same course. Those concepts are reinforced again in General Botany course. For example, Rubisco is thoroughly discussed in the context of photosynthesis in the lecture. To further enhance students' understanding of this enzyme, we designed a computer-based lab that requires students to visualize the structure of Rubisco and its interaction with its cofactor (Mg 2+ ), substrate (RuBP), product (3PG), and competitive inhibitor (D-xylulose-2,2-diol-1,5-bisphosphat or XDP). Students are also encouraged to view the structures of several Rubisco mutants and investigate how a mutation renders the change of the 3D structure of Rubisco and affect its enzymatic function.
In this activity, students are asked to use Cn3D ("see in 3D") at the NCBI site (https://www.ncbi.nlm.nih.gov/) to view the 3D structure of Rubisco at the MMDB or PDB. There are a number of structures of Rubisco stored in these databases, which have mostly been determined by X-ray diffraction technique. We choose several of them for this activity.

The Overall Structure of Rubisco and Its Active Site
First, students are asked to view the overall 3D shape of the entire Rubisco enzyme. Spinach Rubisco in the database (PDB ID: 8RUC) is selected for this purpose (Andersson, 1996). Students are expected to see sixteen chains (or subunits) of Rubisco, including eight large chains and eight small chains, and how these chains are assembled in space. They are also required to examine the secondary structures in Rubisco, including 14 α-helices and 18 β-sheets in the large subunit and 2 α-helices and 5 β-sheets in the small subunit.  (Taylor & Andersson, 1997a) and the binding of its competitive inhibitor (XDP) to the active site through another entry-PDB ID: 1RCO (Taylor et al., 1996). By inspecting and comparing the inhibitor (XDP) and the natural substrate (RuBP) of the enzyme in their structure and shape (Figure 1(B)), students would gain an understanding of how a competitive inhibitor acts: it occupies the active site of Rubisco because of its similar shape to the substrate, preventing the binding of the natural substrate and inhibiting the chemical reaction. In addition, students are asked to observe a Rubisco complex with its product 3-Phosphoglycerate (3PG) through an entry-PDB ID: 1AA1 (Taylor & Andersson, 1997b). Two molecules of 3PG are bound per active site; both of them bind approximately at the same position as its substrate (RuBP) or competitive inhibitor does (Figure 1(C)). From these entries, students are also able to see the disulfide bridge that cross-links two large subunits in each dimer-Cys247 (cysteine at 247 th position) residues of neighboring large chains are involved in the formation of this disulfide bridge ( Figure 1(D)).

Rubisco Mutants
The 3D structures of several Rubisco mutants are also found in the databases.  In addition, students are encouraged to explore some of the revertants in the databases, whose phenotypes have reverted to the normal phenotype by a second mutation. Those revertants help reinforce students' understanding of the intricate relationship of a protein's primary structure with its 3D structure and enzymatic function.

Activity II-Phylogenetic Analysis of Rubisco Proteins
Students have been introduced to the taxonomy and evolution of plants in the lecture; as a result, they have gained the basic understanding of the classification and phylogeny of plants. The new area of molecular phylogenetics is also explained to students and principles underlying the phylogenetic analysis of molecular data, such as DNA and protein sequences are also discussed in the lecture.
In this lab activity, students are required to collect protein sequences of the large subunit of Rubisco from at least twenty-five species representing different branches of photosynthetic organisms; those sequences are then compared and analyzed to reveal the evolutionary relationships among those species and generate a phylogenetic tree through the steps described below.

Collection of Protein Sequences of the Large Subunit of Rubisco
Students are asked to identify and retrieve the protein sequences of the large subunit of Rubisco by searching the protein databases on the NCBI site (https://www.ncbi.nlm.nih.gov/) using the keywords, such as "Rubisco large sub-unit" and the name of species. Each student needs to retrieve 25 or more protein sequences from different types of photosynthetic organisms, including cyanobacteria, green algae, bryophytes, vascular seedless plants, gymnosperms, and angiosperms (monocots, eudicots, and basal angiosperms). Table 1 shows a partial list of the organisms from which the protein sequences of the large subunit of Rubisco are derived.

Generation of Multiple Sequence Alignment
Once the protein sequences of the Rubisco large subunit are retrieved and formatted, students are asked to generate the multiple sequence alignment of these sequences using Clustal Omega (Thompson et al., 1994), which is available on the European Bioinformatics Institute (EBI) site (https://www.ebi.ac.uk/Tools/msa/clustalo/). Students need to examine the alignment visually and understand the mutation events that have led to the mismatches and gaps in the alignment. Mismatches are generally caused by amino acid substitutions, and gaps are usually generated by indels (that is, insertion or deletion mutations). By inspecting the alignment of the protein sequences, students would gain the basic understanding of how the sequences of Rubisco have been diverged by the specific molecular mutations over the course of evolution.

Construction of the Phylogenetic Tree
The Phylogeny Interference Package (PHYLIP) is downloaded from the website (http://evolution.genetics.washington.edu/phylip.html) and used to construct the phylogenetic tree. PHYLIP contains a number of software tools needed for the generation of the tree (Felsenstein, 1989). Specifically, PROTDIST in PHYLIP is used to compute a distance matrix from the alignment of Rubisco protein sequences (obtained from the previous step), which is a table showing the evolutionary distances between all pairs of protein sequences in the dataset; the evolutionary distance is calculated from the number of amino acid differences between a pair of sequences. NEIGHBOR in the same package is then used to generate a neighbor-joining tree (Saito & Nei, 1987) using the distance matrix data generated from PROTDIST. The graphic tree is displayed with Tree View (Page, 1996). Rubisco protein sequence from cyanophyta (or cyanobacteria) is used as an out group to root the tree.

Interpretation of the Phylogenetic Tree
Once a tree is generated, students need to interpret their trees and compare them with the existing trees reconstructed from other data (morphological, anatomical, or molecular) in the textbook or journal articles.
A sample of the phylogenetic trees generated by our students is shown in Figure 2. The topology of the tree is in general agreement with the currently accepted view of the organismal phylogeny of photosynthetic organisms. Although the primitive plants-green algae, bryophytes, and ferns-are not clearly grouped, the tree generally displays the evolutionary trend of plants, from bryophytes and ferns to gymnosperms and angiosperms. Within the group of X. R. Zhang

Discussion
We designed two computer-based activities that were tied to several lecture topics in General Botany, including enzyme chemistry, photosynthesis, and evolution and phylogeny of plants.
We were under the impression that students, in general, showed great interest and enthusiasm for learning during these lab activities, which was demonstrated by their active participation and engagement in these activities. The evaluation of student performance indicated that the implementation of these exercises indeed helped students achieve their learning objectives. For example, most of the However, we were unable to make an accurate assessment of student learning outcomes this time due to the small size of the class (24 students) and the lack of a control group. We will definitely address this oversight in our future studies.
Although these activities were designed and implemented in General Botany course at SSU, they can be easily modified and adapted to teach the similar topics in other biology courses by selecting different groups of proteins or enzymes relevant to that course.

Conclusion
There are several educational implications of this project. First, the traditional labs for General Botany were generally stand-alone labs that were disconnected with each other. These two exercises are developed to foster the interconnectedness of the topics and concepts through a common theme so that students would gain a holistic and coherent view of plant sciences. Second, unlike the traditional recipe-based lab exercises, these two exercises are designed to be inquiry-based and open-ended activities and to promote students' skills in critical thinking, problem-solving, and data analysis and interpretation. Third, students are introduced to several important bioinformatics skills, such as searching databases, retrieving data, and using software to analyze the data, which have become important skills in any areas of biological sciences. Students could apply these valuable skills in their future studies and careers.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.