Structural and Functional Annotation of Hypothetical Protein of Fusobacterium nucleatum Strain MJR7757B: An in Silico Approach ()
1. Introduction
Fusobacterium nucleatum is a prevalent bacterium found in the mouth that has been associated with several human diseases, such as the formation and advancement of colorectal cancer (CRC) [1] . F. nucleatum triggers inflammation, 52 which causes genetic instability and inhibits the body’s immune responses against tumors [2] [3] . This gram-negative anaerobic species also associated with adverse pregnancy outcomes, gastrointestinal disorders, cardiovascular disease, rheumatoid arthritis, respiratory tract infections, Lemierre’s syndrome, and Alzheimer’s disease [4] [5] [6] [7] [8] . F. nucleatum infections commonly respond well to standard antibiotic therapies. Among the effective antibiotics are metronidazole, clindamycin, and beta-lactam antibiotics such as penicillin or amoxicillin [9] [10] . Despite advancements in genomic sequencing, a substantial portion of F. nucleatum’s proteome remains uncharacterized, including the hypothetical protein HMPREF3221_01179.
Hypothetical proteins (HPs), present in genomes, lack experimental characterization yet are essential for diverse cellular processes and signaling pathways. Their annotation is crucial for comprehending disease mechanisms, aiding drug design, vaccine production, and identifying virulent proteins in bacteria through in-silico studies, offering valuable insights into diseases and pathogenesis [11] . In the field of bioinformatics, researchers are actively unveiling the biological functions and characteristics of millions of uncharacterized proteins from different organisms, which perform a wide range of functions, including structuring cells and organisms and participating in vital in vivo processes through interactions with other molecules [12] [13] . By employing bioinformatics methods, researchers can analyze protein structures in 3D, identify new domains, and uncover the functions of proteins, enhancing our understanding of their biological roles [14] . In cases where experimental determination of a protein’s function is challenging, function inference can be achieved through sequence similarity; if this fails, analysis of protein structure offers valuable functional clues, with recent advancements in combining various structure-based approaches and integrating evidence from multiple sources [15] [16] [17] . Understanding the role of such proteins is pivotal for comprehending the pathogenicity and biology of this bacterium.
This study focused on the hypothetical protein HMPREF3221_01179 from F. nucleatum, a bacterium associated with diverse human infections. Using in silico methods, we have investigated the structural and functional annotations of the hypothetical protein (accession no. KXA20922.1) from the F. nucleatum strain MJR7757B.
2. Materials and Methods
2.1. Hypothetical Protein Sequence Retrieval
There are over 400 genome sequences of F. nucleatum accessible in the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov) [18] . This research select a hypothetical protein HMPREF3221_01179 (accession no. KXA20922.1) from the F. nucleatum strain MJR7757B. This protein consists of 438 amino acid residues, and its primary sequence was retrieved in FASTA format for in-depth analysis [19] .
2.2. Analysis of Physicochemical Properties of Hypothetical Protein
The physical and chemical properties of the target hypothetical protein were analyzed using the ProtParam tool available on the ExPASy website (http://web.expasy.org/protparam/) [20] . These properties included molecular weight, aliphatic index (AI) [21] , extinction coefficients [22] , GRAVY (grand average of hydropathy) [21] , and isoelectric point (pI) [23] .
2.3. Hypothetical Protein (Conserved Domains) Function Prediction
The conserved domain analysis of the hypothetical protein was conducted using NCBI Conserved Domain Search Service (https://www.ncbi.nlm.nih.gov/structure/cdd/wrpsb.cgi) [24] , Pfam (https://pfam.xfam.org) [25] , and InterProScan (https://www.ebi.ac.uk/interpro/search/sequence) [26] . CD Search detects conserved domains within a protein sequence by comparing the query sequence using RPS-BLAST (Reverse Position-Specific BLAST) against position-specific score matrices derived from conserved domain alignments in the Conserved Domain Database (CDD) [27] . Pfam, a protein family database, provides annotations and multiple sequence alignments generated through hidden Markov models (HMMs) [25] .
2.4. Multiple Sequence Alignment and Phylogenetic Analysis
A search for protein homologs was conducted using BLASTp from NCBI (http://www.ncbi.nlm.nih.gov) against the nonredundant database, employing default parameters. Sequence alignment and phylogenetic tree construction were carried out using the MEGA 11 program [28] . Specifically, the ClustalW algorithm and Maximum Likelihood (ML) technique within MEGA 11 were employed for iterative Multiple Sequence Alignment (MSA) and tree-building processes, respectively.
2.5. Protein Structure Preparation
The secondary structure of the protein was predicted using the PSI-blast based secondary structure prediction (PSIPRED) (http://bioinf.cs.ucl.ac.uk/psipred) [29] and Self-Optimized Prediction Method with Alignment (SOPMA) (https://npsaprabi.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_sopma.html) [30] servers. The 3D structure of the target protein was determined using the SWISS-MODEL (https://swissmodel.expasy.org/) server [31] . This server automatically searches BLASTp to identify suitable templates for each protein sequence. The resulting 3D model structure was visualized using BIOVIA Discovery Studio Visualizer (BIOVIA Discovery Studio 2021). The three-dimensional model structure generated by the SWISS-MODEL server was further refined using the software Swiss-PdbViewer [32] .
2.6. Protein Quality Assessment
The quality of the generated model structure was assessed using various evaluation tools, including PROCHECK (https://www.ebi.ac.uk/thornton-srv/software/PROCHECK) [33] , QMEAN (https://swissmodel.expasy.org/qmean) [34] from the ExPASy server of the SWISS-MODEL workspace, and ERRAT (https://saves.mbi.ucla.edu/) [35] . Z-scores for both proteins were estimated using the ProSA-web (https://prosa.services.came.sbg.ac.at/prosa.php) server [36] .
2.7. Protein Active Site Prediction
The Computed Atlas of Surface Topography of Proteins (CASTp) server (http://sts.bioe.uic.edu/castp/calculation.html) was employed to identify the predictive protein’s active site. It is essential for predicting the regions and critical residues involved in protein-ligand interactions. The CASTp results were visualized using BIOVIA Discovery Studio Visualizer software.
2.8. Subcellular Localization of Protein
The CELLO: Subcellular Localization Predictive System (http://cello.life.nctu.edu.tw) [37] , Predicts Subcellular Localization of Prokaryotic Proteins (PSLpred) (https://webs.iiitd.edu.in/raghava/pslpred/) [38] , PSORTb v3.0.2 (https://www.psort.org/psortb/) [39] and SOSUI (http://harrier.nagahama-i-bio.ac.jp/sosui) [40] servers were utilized to predict the subcellular location of the hypothetical protein.
2.9. Molecular Docking Analysis
Docking analysis was conducted using Autodock Vina software (http://vina.scripps.edu/download.html) [41] , which aids in studying and predicting ligand interactions with macromolecules. The ligand utilized for docking was Arginine beta-naphthylamide which is an inhibitor of ToIC family proteins. Autodock Vina determined the binding affinity between the target protein and ligand [42] . Protein-protein docking between the target protein and the hemolysin-coregulated protein1 (Hcp1) of S. Typhimurium was performed using the ClusPro 2.0 server [43] . The docking results were analyzed with Discovery studio visualizer.
3. Results and Discussions
3.1. Protein Sequence Retrieval
The hypothetical protein identified under the accession number KXA20922.1 originates from the F. nucleatum strain MJR7757B. This protein consists of 438 amino acid residues, and its primary sequence was obtained in FASTA format to enable subsequent analysis (Table 1).
3.2. Protein Physicochemical Properties
The putative protein consisted of 438 amino acids and had a molecular weight of 52120.02 Da. It is believed that these amino acids have a half-life of more than 10 hours in bacteria. The pH of the protein is 8.33, indicating a slightly alkaline nature. Their aliphatic index (AI) of 85.89 suggests the presence of aliphatic side chains. The hydropathicity (GRAVY) has a grand average of −0.823, showing an average hydrophilic nature. The instability index (II) is 29.47, suggesting a considerable level of stability (Table 2).
3.3. Protein Functional Prediction
Domain analysis involves identifying, characterizing, and understanding the roles of individual domains to gain insights into the overall function and organization of proteins. Several annotation techniques were used to identify conserved regions (domains) and predict the functions of the HP protein. According to the NCBI-CD Search, InterProScan, and Pfam databases the target protein belongs to the outer membrane efflux protein (ToIC family). The ToIC superfamily domain, predicted by the NCBI-CDD server, has an E-value of 7.71e−09
![]()
Table 1. The properties of hypothetical protein retrieved from NCBI database.
![]()
Table 2. The physicochemical properties of hypothetical protein HMPREF3221_01179.
and is located at amino acid residues 92 - 428. Outer membrane efflux protein (ToIC protein family) has a variety of important functions in bacterial physiology. They aggressively eliminate a range of compounds, such as antibiotics and poisons, serving as a barrier against dangerous chemicals and maintaining cellular homeostasis [44] [45] . Their main documented role is in drug resistance, where they force antibiotics out of cells and so promote multidrug resistance. They engage in interbacterial interactions with certain bacteria (Escherichia coli, Pseudomonas aeruginosa, and Salmonella enterica) by exporting virulence factors or poisons to rival bacteria. They may also contribute to biofilms’ production and increase pathogenicity by exporting toxins. Certain efflux proteins move quorum-sensing signalling chemicals [46] .
3.4. Sequence Alignment Assessment and Phylogenetic Analysis
According to the NCBI BLASTp search of the target protein in compared to the nonredundant database, the protein shares 98% - 100% sequence similarity with other known ToIC superfamily proteins from different organisms (Table 3). A phylogenetic tree was constructed to depict the relationship between target hypothetical protein and other ToIC family proteins. The BLASTp results were utilized in the construction of the tree by using Mega11 software. The results suggest that most of the proteins are closely related to each other and found a common ancestor (Figure 1).
3.5. Protein Structure Analysis
The results obtained from the SOPMA analysis revealed three conformational states: extended strand (11.64%), alpha helix (60.05%), and random coil (25.11%). The results obtained using PSIPRED showed that the random coil accounted for 25.38% of the structure, the alpha helix accounted for 60%, and the extended strand accounted for 11.77%. The PSIPRED utilized for the prediction of the secondary structure of the protein is shown in Figure 2.
![]()
Figure 1. A phylogenetic tree illustrating the target protein's evolutionary relationships with other ToIC proteins. The neighbor-joining approach was used by Mega 11 to create the tree based on the score matrix.
![]()
Table 3. NCBI BLASTp result shows thesequence similarity with the target hypotheticalprotein sequence.
![]()
Figure 2. The predicted secondary structure of the target protein by using the PSI-PRED server.
The tertiary structure of the target protein was prepared through SWISS-MODEL service by utilizing a template demonstrating a sequence identity of 93.10% with the hypothetical protein. The Swiss-PdbViewer energy minimization server was utilized for the model protein structure’s energy reduction. The 3D structure after energy minimization is shown in Discover studio visualizer (Figure 3).
3.6. Quality Assessment of Predicted Structure
Utilizing the SWISS-MODEL service, the protein’s three-dimensional (3D) structure was obtained, and it passed all model quality evaluation tools, such as PROCHECK, QMEAN, and ERRAT. As per the PROCHECK results, the ideal area in the Ramachandran plot included 96.6% of the amino acid residues (Table 4) (Figure 4). The overall residues with a QMEAN4 score of 0.54, regarded as satisfactory (Figure 5). Additionally, ERRAT projected that the protein structure had a quality factor of 97.6923, indicating high quality.
The Z-sore obtained from the ProSA server showed the model’s overall quality. It indicated whether the input structure fell within the range of scores normally found for novel proteins of similar size. The Z score for the model obtained from ProSA was −5.89 (Figure 6).
![]()
Figure 3. Three-dimensional target protein structure through SWISS-MODEl server after Swiss-PdbViewer energy minimization (visualized by BIOVIA Discovery Studio Visualizer 2021).
![]()
Table 4. Ramachandran plots calculations of the target protein.
![]()
Figure 4. The Ramachandran plot of the model structure, as verified through PROCHECK.
![]()
Figure 5. The QMEAN result for the model structure.
![]()
Figure 6. The Z-scores obtained from the ProSA server. The Z-score for the model was −5.89.
3.7. Active Site Detection
CASTp provides a detailed, comprehensive, and quantitative analysis of a protein’s topographical features. It can precisely locate and measure functional pockets on protein surfaces and within the 3D structure’s interior. Using the CASTp server, the active site of model structures was examined, and its amino acid residues were ascertained. Then Discover studio was utilized to visualize the results. The major pocket regions were found between 32 - 36, 389 - 396, and 432 - 438, respectively. The model protein’s active residues predicted by CASTp are ASP32, LEU35, ASN36, ASP154, ILE157, GLN158, LYS161, ASP270, TYR432, LYS435, ILE436, ARG438 (Figure 7).
3.8. Subcellular Localization of Hypothetical Protein
The CELLO program identified the location of the target protein at outer membrane with a 3.417 reliability score. The findings from PSORTb and PSLpred were also outer membrane and extracellular protein. A putative protein’s subcellular location is important since it indicates the function and role that the protein plays within a cell. It provides information on the protein’s regulation, interactions with other molecules, and possible role in illness. This knowledge is essential for basic research as well as the creation of new therapeutics [38] .
![]()
Figure 7. Active site prediction by CASTp server (visualized by BIOVIA Discovery Studio Visualizer 2021).
3.9. Molecular Docking Analysis
Autodock Vina program was utilized to run a docking study between the ligand and the target protein, and the interaction was visualized by Discovery Studio (Figure 8). The hypothetical protein belongs to the ToIC protein family which are the efflux proteins that help in pumping the materials across the cell membrane. The compound Arginine beta-naphthylamide is known as an inhibitor of efflux proteins. Therefore, it is employed as a ligand in this work. The ligand demonstrated a substantial affinity for binding to the target hypothetical protein. The ligand’s binding affinity for the model was −7.1 kcal/mol (Table 5). It was discovered that several of the interaction residues in the proteins’ active sites were identical, as predicted by the CASTp server. The discovery of a significant binding affinity of the ligand with the protein of interest further supported our results.
Then the protein-protein interaction of the Hemolysis-coregulated protein 1 (Hcp1) protein of S. Typhimurium and the target protein was done by using Cluspro2.0. Hcp1 played an important role in the proper delivery of antibacterial toxins by interacting with efflux proteins. Hence, Hcp1 was utilized in protein-protein interactions. The docking outcomes are mentioned in (Table 6). It is noted that maximum residues have taken part in exchange from both proteins. The reason might be the selection of higher cluster members protein-ligand complex from the Cluspro 2.0 server. Experimental research has not yet revealed the precise nature of the interaction between the hcp1 and ToIC proteins. Belonging to the ToIC protein family, renowned for its efflux functions, the protein’s interactions with Hcp1 underscore its crucial involvement in the precise delivery of antibacterial toxins.
Overall, the retrieved target protein conserved sequence similar with many F. nucleatum species, which supports the efflux protein’s potential usage as a therapeutic target. The outer membrane efflux proteins are essential for bacterial major functions. In recent years, progress in understanding these proteins has been increased. To the best of our knowledge, this is the first investigation to describe the structural and functional properties of F. nucleatum efflux protein HMPREF3221_01179. We believe this research helps in understating the mechanism of bacterial functions and might help design new drugs in the future. However, more studies are needed to confirm its function at the experimental level.
![]()
Table 5. Details of protein-ligand docking analysis.
![]()
![]()
Figure 8. The 3D and 3D interactions of hypothetical protein HMPREF3221_01179 and ligand meso-diaminopimelic acid. The green colour stick represents the Arginine beta-naphthylamide ligand, which has been docked in the active region of the target protein (Top). The bonding between amino acids and the ligand (Bottom). The docking is visualized by Discovery studio visualizer).
![]()
Table 6. Protein-protein interaction analysis.
4. Conclusion
Microbial genome hypothetical proteins study is crucial for unravelling their unknown functions, leading to insights into microbial biology, potential drug targets, and applications in biotechnology. This in-depth analysis of the hypothetical protein HMPREF3221_01179 from F. nucleatum strain MJR7757B provides valuable insights into its structural, functional, and interaction properties, suggesting its potential as a therapeutic target. Additionally, these findings unveil opportunities for further exploration of this bacterium in the realm of biotechnological applications.
Authors Contribution
Conceptualization: Md. Isrfil Hossen, Sayed Mashequl Bari. Methodology: Sayed Mashequl Bari, Md. Isrfil Hossen, Nusrat Jahan. Formal analysis: Sayed Mashequl Bari, Md. Isrfil Hossen, Nusrat Jahan, Fouzia Mostafa. Writing original draft: Sayed Mashequl Bari, Md. Isrfil Hossen, Fouzia Mostafa. Writing review & editing: Amgad Albahi, Jannatul Ferdaus.